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20*  \;pecified  by  the  CFA  committee.  Each  test  program  was  coded  from  two  to  four 
times  on  each  architecture  to  minimize  the  effect  of  programmer  variability.  An 
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Architecttire 
FDP-11 
IHi  S/3T0 
Interdata  8/32 


M R 

oT^  ^751+ 

1.27  1.29 

0.85  0.83 


Table  1 


In  other  words,  ouj-  tost  program  results  indicate  that  the  IBM  3/370  needs  hCy;', 
mor»'  im^mory  th.an  the  Tnterdata  8/32  to  represent  the  set  of  test  programs  (or  21','! 
more  than  the  average  of  the  t)iree  architectures)  and  the  PDP-11  is  essentially 
average  In  its  use  of  memory.  Similarly,  the  PDP-11*  s ability  to  "exeettte"  the 
test  programs  ranges  between  93^!  and  94^  of  the  average  execution  time  based  on 
the  M ana  R measure,  resp<?otivc'lj'.  Across  the  test  programs  used  in  this  study, 
these  results  show  that  for  all  three  measures  and  Interdata  8/32  Is  the  superior 
compiit.-r  architecture,  follow<=(l  by  the  PDP-11,  ;ind  the  IBM  S/370. 
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1.  introuuctiun 


Wnile  tnere  are  many  useful  parameters  of  a computer  architecture  that  can  De 
determined  directly  from  the  principles  of  operation  manual,  the  only  method  known 
to  De  a realistic,  practical  test  of  the  quality  of  a computer  architecture  is  to 
evaluate  its  performance  against  a set  of  benchmarks,  or  test  programs.  In  Volume  . 
II  we  presented  a set  of  absolute  and  quantitative  criteria  that  the  CFA  committee 
felt  provided  some  indication  of  the  quality  of  the  candidate  computer  architectures. 
It  is  important  to  emphasize,  however,  that  throughout  the  discussion  of  these  cri- 
teria it  was  understood  that  a benchmarking  phase  would  be  needed,  and  that  many  of 
the  quantitative  criteria  were  being  used  to  help  construct  a reasonable  "prefilter" 
that  woula  help  to  reduce  the  number  of  candidate  computer  architectures  from  the 
original  nine  to  three  or  four.  As  described  in  Volume  II,  this  initial  screening 
in  fact  reduced  the  set  of  candidate  computer  architectures  to  three;  the  IBM  S/37U, 
the  HuP-li,  and  the  Interdata  a/Jil. 

Tne  concept  of  writing  benchmarks,  or  test  programs,  is  not  a new  idea  in  the 
field  of  computer  perfonnance  evaluation.  For  the  purpose  of  the  CFA  committee,  we 
define  a test  urogram  to  be  a relatively  small  program  (luO  to  5UU  machine  instruc- 
tions) that  was  selected  as  representative  of  a class  of  programs.  The  CFA  commit- 
tee's test  program  evaluation  study  described  here  must  also  address  the  central 
problems  facing  conventional  benchmarking  studies: 

a.  How  is  a representative  set  of  test  programs  selected? 

D.  biven  limited  manpower,  how  are  programmers  assigned  to  writing  test  pro- 
grams in  order  to  maximize  the  information  that  can  be  gained? 

we  face  an  additional  problem  here  because  we  are  evaluating  computer  archi- 
tectures, independent  of  any  of  their  specific  implementation.  In  other  words. 

When  evaluating  particular  computers,  time  is  the  natural  measure  of  how  fast  a 
test  program  can  be  executed.  However,  a computer  architecture  does  not  specify 
the  execution  time  of  any  instructions.  Thus  an  alternative  to  time  must  be  chosen 
as  a metric  of  execution  speed. 

This  volume  explains  now  the  CFA  committee  addressed  the  above  questions  and 
presents  tne  results  of  the  test  program  evaluation  of  the  three  candidate  archi- 
tectures. The  next  section.  Section  2,  describes  how  the  12  test  programs  used 
in  the  evaluation  process  were  selected,  and  Appendix  A gives  the  actual  specifi- 
cations of  the  test  programs  used  by  the  programmers  to  code  the  programs  for  the 
candidate  arcnitectures.  Section  J explains  the  measures  of  architecture  perfor- 
mance that  were  used  in  tnis  study.  Section  4 explains  how  16  programmers  were 
assigned  from  six  to  nine  programs  each,  in  order  to  get  a set  of  slightly  over  lUU 
test  program  implementations  that  were  used  to  compare  the  relative  performance  of 
tne  candidate  arcnitectures.  The  principle  results  of  tne  test  program  evaluation 
are  presented  in  Section  6,  and  Appendix  E contains  the  actual  S,  M,  and  and  R mea- 
surements of  all  of  the  test  programs.  Tne  analysis  of  variance  procedure,  and 
related  procedures,  used  to  analyze  the  test  program  statistics  are  also  reviewed. 

We  conclude  this  report  with  a brief  summary  and  discussion  of  our  experiences  and 
problems,  that  should  prove  useful  for  any  further  work  in  this  area. 
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1.  TEST  PUObRAM  SHECIFICAflON 

A Test  Program  Subcommittee  was  appointed  at  the  first  CFA  committee  meeting 
of  1 ano  2 October  1975  and  was  charged  with: 

(1)  determining  the  general  criteria  for  test  programs 

(2)  developing  characteristics  to  be  tested 

(3)  specifying  specific  test  programs 

■ ieinbers  wore: 

dill  liurr  (Chairman)  USA  ECOil 

LT  del  ton  Allen  NPiiS 

Mark  Steptiens  Pi'ITC 

Forrest  Sumioer  MTEC 

'ii  1 I iam  McCoy  tJSWC,  Uahlaren 

The  Test  Progra,:)  Subcommittee  met  on  2U  and  21  October  1975  to  formulate  recommen- 
■latio'is  tor  ttie  ijecom.l.'er  1975  meeting  of  the  CFA  committee. 

Ihe  ultimate  goal  of  the  subcommittee  was  to  design  a procedure  that  would 
oro'/uio  data  about  the  relative  efficiencies  of  the  candidate  computer  architec- 
tures. and  :iot  data  about  anv  particular  implementation  of  an  architecture.  In 
tou  conventional  uonchnark  ipproact).  tne  performance  ot  a particular  comnuter  on 
'■'.'presentati  ve  programs  (usually  coded  in  a Higher  Order  Linguane  common  to  all 
tne  co'nputers  to  oe  measiireo)  is  measured  against  time.  Tiiis  would  not  satisfy 
test  program  objectives  necause  it  measures  an  actual  implementation  of  an  archi- 
tecture. Tne  subcoi;imi ttoo  decided  that  the  measures  ot  architectural  merit  must 
be  time  independent,  and  would  have  to  be  based  on  the  number  of  bits  of  memory 
reiiuired  to  ri’present  a particular  specified  alaorithm  and  the  numper  of  bits 
■men  './ore  transferred  oe  tween  the  processor  and  main  iiiemorv  to  executo  an  algo- 
rithm. Ine  next  section.  Section  3,  describes  in  detail  the  measures  of  archi- 
c. tore  I'or rormanco  used  in  this  study. 

a.  4l ternitivo  Approaches 

A number  or  alternative  tost  program  specifications  were  considcreo  by  the 
Test  drograia  Subcommittee  and  the  entire  CFA  committee.  The  major  alternatives 
are  discussed  here. 

(1)  fligher  Order  Language  Test  Programs 

A tomptinn  proposal  was  to  use  test  programs  written  in  a Higher  Order  Lan- 
^•ua  (ii"l  ),  Put  tn  use  tir.ie  indonendent  measures.  This  ha^J  the  advantage  of 
M 1 .w  I I .ingle  tlOL  source  nrograii  to  be  used  tor  all  tne  archi  tectures  to  De 

tested.  This  also  would  have  permitted  the  use  of  existing  penclimark  programs, 
whicii  were  coded  in  FORTRAN,  which  were  available  from  several  sources  (FCUSSA, 
and  :iAOC),  and  which  were  extracted  from  "real"  military  systems.  One  disadvan- 
tage of  this  approach  was  that  no  one  language,  even  FORTRAN,  was  available  on 
all  the  nine  initial  candidate  architectures  and  those  languages  developed  for 
ise  in  tactical  military  applications  (e.n.,  JOVIAL,  CMS-2,  CS-4,  and  TACPOL) 
were  eacn  available  on  only  a few  of  the  candidate  architectures.  There  are 
FORTRAN  IV  and  COBOL  compilers  available  for  each  of  the  tlm-ee  final  candidate 
architectures,  however  neither  FORTRAN  nor  COBOL  are  widely  used  in  tactical 
military  applications.  The  major  disadvantage,  however,  was  that  there  was  no 
practical  way  to  separate  the  effects  of  compiler  quality  from  the  effects  of 


architectural  efficiency,  and  the  object  of  the  experiment  was  to  measure  only 
tne  architecture.  The  HOL  approach  to  the  Test  program  effort  was  ruled  out, 
because  the  results  would  necessarily  involve  a significant  undetermined  compo- 
nent, which  would  be  due  to  variations  in  the  efficiency  of  compilers  which  are 
unlikely  to  be  extensively  used  in  tactical  military  applications.  These 
unmeasurable  compiler  effects  might  well  mask  genuine  differences  in  the 
intrinsic  efficiencies  of  the  architectures. 

(2)  Synthetic  Programs 
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Another  test  program  approach  proposed  the  use  of  synthetic  programs. 
Synthetic  programs  are  hi  only  parameterized  programs,  written  in  either  assem- 
ply  or  higher-level  languages,  and  designed  to  provide  a "representative"  com- 
putational and  I/O  load  on  a computer  system.  [Buchholz,  1969].  Synthetic 
programs  do  not  do  any  useful  computation,  but  rather  are  structured  as  a set 
of  loops,  controlled  by  the  input  parameters  of  the  synthetic  program,  that 
compute,  initiate  I/O  activity,  or  request  operating  system  services.  Syn- 
thetic loads  have  been  most  sucessful  in  testing  computer  systems  against 
varying  multiprogramming  job  mixes  and  for  evaluating  system  configurations. 
Synthetic  programs  were  not  used  here  because  we  were  interested  in  how  well 
an  architecture's  instruction  set  could  represent  and  execute  specified 
al  gori  thi.is. 

(3)  Code  Test  Programs  in  Assembly  Language 

Using  standard  (Machine-Oriented)  assembly  language  for  the  test  pro- 
grams was  tne  oovious  alternative  to  the  use  of  Higher  Order  languages,  out 
it  had  several  obvious  disadvantages.  First,  each  program  would  have  to  be 
recodeo  for  each  machine,  adding  to  the  effort  involved.  Moreover,  this  in- 
troduced orogramrier  variaoility  into  the  experiment,  and  previous  studies 
nave  shown  programmer  variaoility  to  be  large  (variations  of  factors  of  4:1 
or  more  are  commonly  accepted).  Finally,  it  is  much  more  expensive  to  code 
in  assembly  language  than  in  HOL's,  and  this  would  limit  the  size  or  number 
of  tne  test  programs.  Nevertheless,  the  committee  felt  that  there  were  ways 
to  limit,  separate  and  measure  these  programmer  effects,  while  there  was  no 
practical  way  to  limit  or  separate  the  effects  of  compiler  efficiency.  It 
was  therefore  decided  that  the  test  programs  would,  of  necessity,  be  coded 
in  assembly  language. 

b.  Guidelines  for  Test  Programs  Specification 

The  Test  Program  Subcommittee  attempted  to  establish  a strate^  for  de- 
fining and  coding  the  test  programs  that  would  minimize  the  variability  due 
to  differences  in  programmer  skill.  The  strategy  devised  was  as  follows: 

(1)  The  test  programs  would  be  small  "kernel"  type  programs,  of  not 
more  than  2UU  machine  instructions.  (In  the  end,  a few  test  programs  re- 
quired more  than  2UU  instructions.)  It  was  felt  that  only  small  programs 
could  be  specified  and  controlled  with  sufficient  precision  to  minimize  the 
effects  of  programmer  variability.  Moreover,  insufficient  resources  were 
available  to  define,  code,  test,  and  measure  a significant  set  of  larger 
programs. 

(2)  Mr.  W.  L.  McCoy  proposed  that  the  programs  be  defined  as  structural 
programs,  using  a PL/1-1  ike  Program  Definition  Language  (PDL)  and  then  "hand 
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translated"  into  the  assembly  languages  of  the  respective  architectures. 

I his  proposal  is  essentially  the  IBM  Structured  Assembly  Language  methodology, 
and  it  was  aaopted  oy  the  subcommittee.  It  was  recognized  that  POL  defini- 
tions would  be  impractical  for  some  test  programs,  specifically  those  intended 
to  measure  interrupt  handling. 

(3)  Programmers  would  not  be  permitted  to  make  algorithmic  improvements 
or  modifications,  but  ratner  would  be  required  to  translate  the  POL  descrip- 
tions into  assembly  language.  Programmers  we^^e  free  to  optimize  their  test 
programs  to  the  extent  possiole  with  highly  ootimizing  compilers.  This  "hand 
translation"  procedure  of  strictly  defined  algorithms  was  expected  to  reduce 
variations  due  to  programmer  sxill. 

(4)  All  test  programs  except  the  I/O  Interrupt  Test  Program  would  be  coded 
as  reentrant,  position-independent  (or  self  relocating)  subroutines.  This  was 
believed  to  oe  consistent  with  the  best  contemporary  programming  practice  and 
provides  a good  test  of  an  architecture's  subroutine  and  addressing  capabilities. 

c.  Selection  of  the  Twelve  Test  Programs 


The  test  program  subcommittee  then  specified  a set  of  lb  proposed  test  pro- 
gram kernels.  These  kernels  were  intenoed  to  oe  oroadly  representative  of  the 
basic  types  of  operations  perrormed  in  military  computer  systems.  These  test 
programs  and  the  proposed  inetnodology  were  presented  to  the  CFA  Committee  at  its 
second  meeting  on  0 anu  4 uecemoer  ig/b.  Several  additional  test  program  ker- 
nels were  then  proposed  by  Ur.  P.  Cordon  and  Mr.  W.  L.  McCoy.  An  expanded  set 
of  proposed  test  programs  was  prepared.  This  expanded  set  was  then  presented 
to  the  CFA  Committee  at  its  third  meeting  in  March  ig'/o. 

At  that  meeting  CFA  Committee  members  were  asked  to  rank  the  test  programs 
in  order  of  their  importance.  A composite  rating  of  each  of  tne  test  programs 
was  made,  ano  it  was  agreed  tnat  the  top  2a’  programs  would  oe  the  basis  of  the 
test  progra.ii  experiment.  An  ad  rioc  subcommittee  was  formed  to  finalize  the  de- 
tailed specification  of  tne  Test  Programs.  This  subcommittee  met  on  11!  March 
ly/o  at  nRL.  Its  meinoers  were: 


S.  Ful  ler  (Chairiiian) 
w.  burr 
L.  Haynes 
w.  I’lcCoy 
L.  Uenoia 
u.  Farnas 


;WL/CMd 

uSAECOil 

NSkIC,  White  Uak 
llSWC,  Uahlgren 
nUSC,  New  London 
NRL/uarmstadt 


Ihis  subcommittee  reviewed  tne  la!  test  programs  designated  by  tne  CFA  Com- 
nittee  mo  produced  a final  revised  set  of  Test  Program  Specifications.  The 
jpeoi  r icaiions  of  the  1<!  test  programs  are  given  in  Appendix  A and  they  are 
briefly  described  below: 

(1)  I/u  kernel,  four  priority  levels,  requires  the  processor  to  field 
interrupts  froin  four  devices,  each  of  wnich  has  its  own  priority  level.  While 
one  device  is  oeing  processed,  interrupts  from  higner  priority  devices  are 
allowed. 


(2)  I/O  kernel,  FIFO  processing,  also  fields  interrupts  from  four  devices, 
but  without  consideration  of  priority  level.  Instead,  each  interrupt  causes  a 
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i^equest  for  processing  to  oe  queued;  requests  are  processeo  in  FIFO  oroer.  wnile  ^ 

a request  is  oeing  processed,  interrupts  from  otner  devices  are  allowed.  j 

i 

(31  I/u  device  nanoler,  processes  applicatio''  programs'  request  for  I/O 
jIock  transfers  on  a typical  tape  drive,  and  returns  the  status  of  tne  transfer 
upon  completion. 

(4)  Large  FFT,  computes  the  fast  Fourier  transform  of  a large  vector  i 

of  j4-Dit  floating  point  complex  numoers.  This  Oencnmark  exercises  tne  ' 

machine's  floating  point  instructions,  but  principally  tests  its  ability  to 
manage  a large  address  space,  (up  to  one  half  of  a million  bytes  may  be 
required  for  the  vector.) 

(5)  Cnaracter  search,  searches  a potentially  large  cnaracter  string 
tor  tne  first  occurrence  of  a potentially  large  argument  string.  It  exer- 
cises the  ability  to  move  through  cnaracter  strings  sequentially. 

(fil  dit  test,  set,  or  reset  tests  the  initial  value  of  a bit  within  a 
bit  string,  then  optionally  sets  or  resets  the  bit.  It  tests  one  kind  of  bit 
mani pul ation. 

(7)  Kunge-kutta  integration  numerically  integrates  a simple  differential 
equation  using  thi rd-oroer  kunge-kutta  integration.  It  tests  floating-point 
ari  tnmeti  c. 

(8)  Linked  list  insertion  inserts  a new  entry  in  a doubly-linked  list. 

It  tests  pointer  manipulation. 

(9)  Quicksort  sorts  a potentially  large  vector  of  fixeo-lengtn  strings 
using  the  Quicksort  algorithm.  Like  FFT,  it  tests  the  ability  to  manipulate 

a large  address  space,  out  it  also  tests  the  ability  of  tne  machine  to  support 
recursive  routines. 

(10)  ASCII  to  floating  point  converts  an  ASCII  string  to  a floating  point 
numoer.  It  exercises  character- to-numeric  conversion. 

(11)  Boolean  matrix  transpose  transposes  a square,  tightly-packed  bit 
matrix.  It  tests  the  ability  to  sequence  tnrougn  bit  vectors  oy  arbitrary 
increments. 

(12)  virtual  memory  space  exchange  changes  tne  virtual  memory  mapping  ; 

context  of  the  processor. 

Tne  specifications,  written  in  the  Program  Definition  Language,  were  in- 
tended to  completely  specify  tne  algorithm  to  be  used,  out  allow  a programmer  ; 

the  freedom  to  implement  the  details  of  the  program  in  whatever  way  best  suited 
tne  arcnitecture  involved.  For  example,  in  the  ASCII-to-floating-point  bench- 
mark, program  J,  the  PDL  specification  included  the  statement: 

NUi4BER  integer  equivalent  of  cnaracters  POSITION  to  J-1  of  A1 
Where  character  J of  A1  is 

This  description  instructs  tne  programmer  to  convert  the  character  substring  j 

position,  POSITION  +1,  ....  J-1  to  an  integer  and  store  the  result  in  the  inte-  \ 

ger  NUMBER.  It  left  up  to  the  programmer  whether  he  would  sequence  through  the 
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string  cnardcter-Dy-ctidrdCter,  accuiiiul an ng  an  integer  nuinoer  until  ne  tounn  a 
aot,  or  perhaps  (on  the  S/o/o)  use  the  Translate-ana-rest  (IKT)  instruction  to 
rino  tne  aot,  tnen  use  PACK  and  Convert- to-oi nary  (CVa)  to  do  the  conversion. 

It  did  toroid  Him  to  accumulate  tne  result  as  a floating  point  nuinoer  directly, 
forcing  him  to  first  convert  to  an  integer,  ano  tnen  to  floating  point. 

For  another  example,  tne  Boolean  matrix  Transpose  specification,  program  K, 
included  the  statement 

swap  BLl,d]  with  OLd.Ij 

where  B is  a ti ghtly-pacxed  ooolean  matrix.  An  earlier  version  of  the  specifi- 
cation said  instead 

ItrlP  BLI.JJ 
BLl  ,J]  BLd  ,I  j 
BLO.lJ  ftMP 

A strict  interpretation  of  tnis  latter  example  would  force  tne  programmer  to  do 
explicit  oit  fetches  ano  stores,  as  in  tne  Bit  Test,  Set.  or  Peset  program,  F. 
iiie  later  specification  allowed  more  flexioility. 

d.  e •(  jL  LuUPtS  rOK  UtOOOoIub,  AwO  flFASUP  IlCb  Trih  TEST  PPObPAi'IS 

Ine  test  programs  were  written  oy  Id  programmers  at  various  Army  and  i>iavy 
lauoratories  ano  at  carnegi e-Mel  1 on  university.  A set  of  reasonaoly  comprehen- 
sive instructions  ana  conventions  were  needed  to  insure  tnat  the  various  program- 
iiers  produced  results  tnat  could  oe  compared  in  a meaningi  i1  way.  Section  a ot 
ti.is  void. lie  discusses  tne  assignments  made  to  tne  programmers,  and  shows  now  these 
jssignments  were  made  to  miniiiize  the  distortion  ot  the  fi  lal  conclusions  oue  to 
viriatioiis  Between  programmars.  In  addition,  we  also  agreed  tnat  it  was  not  suffi- 
cient to  ]jst  write  the  test  programs  in  asseniply  language  We  instructed  each 
projrj..i.i.jr  that  all  of  tne  t^st  programs  that  he  wrote  had  to  oe  assemoled  ano  run 
■•jii  the  appropriate  computer.'^  Appendix  B is  a copy  of  the  test  data  that  we  di s- 
triDuteu  to  the  programmers.  A test  prograrn  was  defined  to  De  deouggeo  for  tne 
purposes  ot  the  CFA  committee's  worx  if  it  performed  correctly  on  tne  test  data 
Shown  in  Appendix  B.  Also  note  that  some  of  the  input  parameters  are  marked  to 
indicate  they  are  tne  data  that  was  used  when  tne  test  program's  execution  perfor- 
mance was  Jiieasureo. 

Appendix  C aescrioes  tne  details  of  the  subroutine  calling  conventions  assumed 
for  each  Of  trie  three  final  candidate  architectures.  These  calling  conventions  were 
used  oy  driver  progranis  in  tne  doougging  and  testing  of  tne  test  programs.  These 
callin,  -uhventi ons  were  designed  to  inaxe  as  efficient  use  ot  eacn  architecture  as 
possiDie,  consistent  with  tne  constraint  of  requiring  tne  test  programs  to  be  re- 
entrant, posi ti on-i noependent  suoroutines. 


Tne  exceptions  were  test  programs  A,  b,  C,  and  L since  they  eli  require 
tne  use  of  privileged  instructions  and  it  was  impractical  to  require  pro- 
grammers to  get  stand-alone  use  of  all  tne  candidate  machines.  In  tnese 
four  cases,  an  "expert"  on  a test  program  was  designated  and  he  was  respon- 
sible for  reading  in  detail  all  implementations  of  the  test  program  and 
returning  the  test  programs  to  the  programmer  for  correction  if  he  detected 
any  errors.  c 
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very  little  iidS  Deen  done  in  the  past  to  quantify  the  relative  (or  aDsolute) 
perforiiiance  or  computer  architectures,  independent  of  specific  implementations, 
nerice,  liKe  it  or  not,  we  nad  little  choice  out  to  define  measures  of  architecture 
pertonnance  for  ourselves. 

Funaamental ly , performance  of  computers  is  measured  in  units  of  space  and  time. 
To  put  it  anotner  way,  tne  resources  ttiat  are  used  to  solve  problems  on  computers 
nave  units  or  space  and  time  (e.g.,  computer  charges  commonly  include  terms  of  pro- 
cessor seconds  used,  primary  memory  space  used,  and  secondary  storage  space  used), 

Tne  iiieasures  tnat  were  used  by  the  CFA  Committee  to  measure  a computer  architecture's 
perroriiiance  on  tne  test  program  were; 

iieasure  of  Space 

S;  Numoer  or  bytes  useo  to  represent  a test  program. 
iTeasure  of  Execution  Time : 

.'i;  Numoer  of  oytes  transferred  oetween  primary  memory  and  tne  processor 
during  the  execution  of  tne  test  program. 

R.  .^uiiioer  or  oytes  transferred  among  internal  registers  of  the  processor 
ouring  execution  of  the  test  program. 

me  remainaer  or  tnis  section  will  develop  exact  definitions  of  these  three 
leasjres.  lerms  sucn  as  internal  register  and  processor  will  oe  given  very  spe- 
.inc  'lefi  nit  ions  and  tne  constraints  on  the  types  of  oyte  transfers  will  be 
uen  neu. 

All  of  tnc'  measures  oescrioed  in  this  section  are  measured  in  units  of  o-oit 
oytes.  A iiiore  ’■unaamental  unit  of  measure  would  oe  bits,  but  we  faced  a number  of 
annoying  problems  wi tn  respect  to  carry  propagation  and  fiela  alignment  that  make 
Ltie  measurement  of  S,  .>1,  and  R in  oi ts  unduly  complex.  Fortunately,  all  the  com- 
puter arcin tectures  unoer  consioeration  by  tne  CFA  committee  were  based  on  b-bit 
oytes  (rather  than  o,  7,  or  s-oit  oytes)  and  hence  the  oyte  unit  of  measurement 
can  oe  conveniently  applieo  to  all  these  machines. 

a.  TEST  HRUbRAM  SIZE 

An  important  indication  of  how  well  an  arcnitecture  is  suited  for  an  applica- 
tion (test  program)  is  tne  amount  of  memory  needea  to  represent  it.  Re  define 
Sii,j,K)  to  oe  tne  numoer  of  b-oit  bytes  of  memory  used  by  programmer  k to  repre- 
sent test  prograi.i  i in  the  machine  language  of  arcnitecture  j.  Tne  S measure  in- 
cludes a I I instructions,  inoirect  adoresses,  and  temporary  work  areas  required  by 
tne  program. 

liie  only  me^nory  requirement  not  included  in  S is  the  memory  needed  to  hold 
tne  actual  data  structures,  or  parameters,  specified  for  use  by  tne  test  programs. 

For  example,  in  tne  Fourier  transform  test  program  S did  not  include  the  space  for 
tne  actual  vector  of  complex  floating  point  numbers  being  transformed,  but  it  did 
induce  pointers  useo  as  indices  into  the  vector,  loop  counters,  booleans  required 
oy  tne  program,  and  save-areas  to  hold  the  original  contents  of  registers  used  in 
tne  computation. 

If  one  or  more  of  the  final  candidate  architectures  had  substantially  differ- 
ent data  types  (as  would  have  happened  if  the  bo7uu  hao  remained  a candidate  archi- 
tecture into  tne  test  program  phase)  we  would  have  had  to  carefully  define  the 
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precision  required  oy  the  input  and  output  data  rather  than  simply  defining  the 
actual  format  of  tne  data.  Fortunately,  tne  ItJM  S/J7u,  FUF-11,  and  tne  Interdata 

all  nave  a-oit  cnaracters,  Ib-oit  integers,  Jil-oit  and  b4-bit  floating  point 
numoers.  Tne  fact  that  some  computer  architectures  support  data  types  not  sup- 
ported oy  otner  architectures  is  not  a problem  here  (e.g.,  the  IBM  S/B7U  supports 
decimal  numoers  and  li!B  bit  floating  point  numbers  and  the  other  candidate  archi- 
tectures do  not).  Test  Programs  simply  specify  the  format  of  their  input  data 
structures  and,  if  a candidate  architecture  does  not  nave  instructions  to  directly 
support  a data  type,  it  provides  support  via  emulation  in  software. 

oiven  tne  three  architectures  being  compared,  the  only  point  at  wnich  there 
IS  some  complication  is  with  the  floating  point  formats.  Even  thougn  all  tnree 
arcni tectures  support  ii.  bit  and  o4  oit  floating  point  numbers,  they  use  different 
formats  and  this  leads  to  slightly  different  amounts  of  precision.  For  example, 
tne  ldil/J7u  and  Interdata  alii  use  a base  of  io  for  their  exponent  while  the  PUP-ii 
uses  a oase  of  i and,  in  addition,  tne  PUP-ll  uses  the  "hidden  oit"  technique  to 
piCK  up  a little  more  precision.  These  differeht  formats  will  result  in  slightly 
differeru  degrees  of  precision  in  the  results  LBrent,197JJ;  pathological  cases 
could  oe  constructed  where  the  Jii-oit  floating  point  representation  in  one  archi- 
tecture will  enable  an  algorithm  to  converge  to  a reasonable  answer  while  the  i'i 
oit  format  of  one  of  tne  other  architectures  will  not  converge,  and  hence  the  b4 
oit  format  would  be  required.  Clearly  we  would  have  been  in  more  serious  diffi- 
culty If  tne  PuP-io  with  its  Jo-oit  format  or  the  UYK-/  with  its  4a-bit  floating 
.lOint  format  (io  oit  exponent,  o^!-bit  mantissa)  had  needed  to  oe  evaluated. 

b.  PKUCtSSUR  EXtCUTIUd  PATE  MEASUkES 

In  selecting  among  computer  architectures  as  opposed  to  alternative  computer 
systems,  we  are  face  with  a fundamental  dilemma:  one  of  the  most  basic  measures 
of  a computer  is  tne  speed  with  which  it  can  solve  probleiis,  yet  a computer  archi- 
tecture IS  an  abstract  description  of  a computer  that  does  not  define  tne  time 
required  to  perform  any  operation.  (In  fact,  it  is  exactly  this  time-independence 
ttidt  iiakes  tne  concept  of  a computer  architecture  so  attractive!)  oiven  this  di- 
lemma, one  reaction  might  oe  to  ignore  performance  when  selecting  among  alternative 
coi.iputer  architectures  and  leave  it  to  the  engineers  implementing  the  various  phys- 
ical realizations  to  worry  aoout  execution  speed.  However,  to  adopt  this  attitude 
woulu  invite  disaster.  In  other  words,  although  we  are  evaluating  arcni tectures, 
not  implementations,  it  is  essential  that  tne  architecture  selected  yiel d cost/ 
effective  implementations,  i.e.,  tne  architecture  must  be  "implementaole, " 

Ihe  I'i  and  k measures  defined  in  this  section  were  developed  to  measure  those 
aspects  of  tne  architecture  that  will  affect  tne  performance  of  implementations  of 
tne  architecture. 

(1)  r rocessor/i  ibNiory  iransfers 

If  there  is  any  single,  scalar  quantity  tnat  comes  close  to  measuring  the 
"power"  of  a computer  system,  it  is  the  bandwidth  between  primary  memory  and  tne 
central  processor! s) . LBell  and  Mewell,  1971;  LR,  197b;  Stone,  197b] 

Tnis  measure  is  not  concerned  with  the  internal  workings  of  either  the  pri- 
mary memory  or  the  central  processor;  it  is  determined  oy  the  width  of  the  bus 
(w)  between  primary  memory  and  the  processor  and  the  number  of  transfers  per 
second  tne  ous  is  capable  of  sustaining  (see  Figure  3-1).  Since  processor/mem- 
ory  oandwidth  is  a good  indicator  of  a computer's  execution  speed,  an  important 


8 


I'udsure  of  an  arcni tecture' s effect  on  the  execution  speea  of  tne  computer  s/stem 
s tne  di.iount  of  infonndtion  it  must  transfer  oetween  primary  nemory  anci  tne  ^,ro- 
cesspr  fjtiring  tne  execution  of  a program.  If  one  arcni  tecture  must  reao  or  »jrite 
^xio**o  oytes  in  primary  memory  in  oroer  to  execute  a test  program  and  tne  second 
arcni tecture  must  read  or  write  lU**b  oytes  in  order  to  execute  the  same  test 
program,  tnen,  given  similar  implementation  constraints,  we  would  expect  the 
second  architecture  to  be  substantially  faster  than  the  first. 


W bits 


Primary 

r 
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( 

Memory 

■ V 

Processor 

Figure  d-1 


The  particular  measure  of  primary-memory/central-processor  transfers  used  by 
tne  CFA  Committee  is  called  tne  M measure.  M(i,j,k)  is  tne  number  of  b-bit  bytes 
that  must  oe  read  or  written  from  primary  memory  by  the  processor! s)  of  computer 
architecture  j during  the  execution  of  test  program  i as  written  oy  programmer  k. 

Clearly,  there  are  implementation  tecnniques  use^.  in  the  design  of  processo-s 
and  memories  to  improve  performance  by  attempting  to  reduce  processor/memory  traf- 
fic, i.e.,  cacne  memories,  instruction  lookaneao  (or  oehind)  buffers,  and  other 
uuffering  schemes.  Tnese  are  important  imple.;>5jtdtion  techniques  that  are  used  in 
hi gn-perfonnance  models  of  any  architecture  ant  “ structure  of  the  architecture 
can  affect  tne  d gree  to  which  these  implementation  techniques  will  speed  up  execu- 
tion. However,  with  the  intention  of  keeping  our  measure  of  processor/inemory  traf- 
fic as  simple,  clean,  and  implementation-independent  as  possible,  none  of  these 
buffering  techniques  will  oe  considered.  At  the  completion  of  one  instruction,  and 
before  the  initiation  of  the  next  instruction,  tne  only  information  contained  in  the 
processor  is  tne  contents  of  the  registers  in  tne  processor  state.  (Refer  back  to 
Volume  II,  for  the  definition  of  processor  state.) 

Figure  i-Z  may  help  clarify  what  I/O  traffic  is  included  in  the  M measure.  All 
the  oytes  that  are  transferred  to  execute  a test  program  across  the  processor/memory 
bus  and  the  I/u  control  bus  are  included  in  the  M measure. 

In  many  simple  computer  systems  the  I/O  processors  (or  wo  rmnnoiei  cinn-o 
■i-d.  are  replaced  oy  uMm  loirect  n'emory  access)  controllers  or  even  simpler  inter- 
faces under  direct  (central  processor)  program  control. 


9 


BtSl  AVAILABLE  COPY 


smouIq  lie 
loop  or  d 


Vdole  p-i  snows  an  example  of  a sma  1 1_^ i'bH  S/J7u  instruction  sequence  which 
Tilustrate  tne  calculatiOTi  of  m.  Tne  instructions  are  tne  oasic 
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4 

U) 

LA 

2,iu(u,0) 

Set  R2  to  16,  the  length  of  the  vectors 

12) 

lA 

0,XVEC 

Load  Ri  with  starting  adaress  of  X vector. 

4 

lb) 

LA 

4.YVEC 

Load  R2  witn  starting  adaress  of  Y vector. 

4 

14) 

SUR 

2,2 

Clear  floating  point  reg.  2.  Use  it  to 
accumulate  inner  proouct. 

2 

lo) 

SR 

7,7 

Clear  R7  and  use  as  index  into  floating 
point  vectors. 

2 

lxlb=  lo 

(o)  LOUP 

LE 

4,0(7 ,o) 

Load  x(i)  into  floating  point  register  4. 

6 

(7) 

.'IE 

4 , U 1 7 , 4 ) 

I'lul  tiply  X(  i ) oy  Y(  i ) . 

6 

lo) 

AUR 

2,4 

Sum:  =Suin  + X(  i ) * Y(  i ) . 

2 

(a) 

lA 

7, 4(0, 7) 

Increment  index  by  4 bytes. 

4 

llu) 

OCT 

2, LOUP 

Decrement  loop  count  and  branch  back  if 
not  done. 

4 

10x20=200 

Ux) 

STj 

2,Sui'i 

Store  double  precision  result  in  SUM. 

12  J2 

Total 
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Taole  J-i.  M lieasure  for  Ian  S/a/o  Inner  Product  Example 


Instruction  (i)  is  a J2  oit  instuction  (RX  format)  ana  requires  no  operand  fetches 
rroni  primary  memory.  Hence  the  H measure  for  instruction  (1)  is  4 bytes.  Instruc- 
tions (2)  and  la)  are  also  LA  instructions  ana  also  add  4 each  to  the  M measure. 
Instruction  (4)  is  a io-oit  instruction  (RR  format)  ana  hence  H = 2.  Note  that  the 
fact  (4)  initiates  an  a-byte  floating  point  suotraction  does  not  enter  into  the  cal- 
culation of  ,'l.  la)  IS  also  an  kK-format  instruction  ana  adds  2 to  the  M measure. 
Instruction  (o),  a loao,  requires  a 4-Dyte  instruction  fetch  plus  a 4-byte  data 
fetcn,  as  does  instruction  (7).  Instruction  (a)  is  a RR  instruction  which  requires 
only  a 2-Dyte  instruction  fetch;  instruction  (9)  is  another  Load  Address  requiring 
only  a 4-oyte  instruction  fetch,  and  instruction  (lu)  is  a branch  requiring  a 4-byte 
instruction  fetcn.  The  final  instruction,  (11),  requires  a 4-byte  instruction  fetch 
and  also  an  a-oyte  store  to  save  tne  double  precision  floating  point  sum.  Instruc- 
tions (a)  through  (lu)  are  each  executed  10  times  and  hence  their  contribution  to 
ii  will  oe  tne  proouct  of  the  cost  of  an  individual  instruction  execution  and  the 
numner  of  times  it  is  executed.  Tne  total  M measure  of  this  instruction  sequence 
is  2ao  oytes. 

Altnougn  tne  iT  measure  nas  been  designed  to  be  as  simple  and  clean  a measure 
of  processor/memory  oanawiatn  as  possible,  we  discuss  below  the  areas  wnere  some 
clarification  is  required: 

a.  oit  and  field  accessing.  Some  computer  architectures  nave  a set  of  in- 
structions for  manipulating  individual  oits  in  memory.  In  these  cases,  count  in 
tne  Pi  measure  a one  byte  read  to  simply  read  a oit  from  memory,  and  a 1-byte  read 
followed  oy  a 1-byte  write  to  fetcn  the  bit  from  memory,  modify  tne  selected  oit 
and  store  it  oacx  in  memory.  The  following  examples,  which  are  described  using  an 
ISP-like  notation,  illustrate  this  principle: 
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SBT  Rl. 


IR*  .•■[RC:PC+3] 
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This  is  an  Interdata  8/32  instruction 
that  sets  a bit  in  memory.  0(R2I  spe- 
cifies the  beqinninq  of  a bit  vector 
and  Rl  identifies  the  bit  within  the 
vector  that  is  to  be  set. 

Four  bytes  of  instruction  are  loaded 
into  the  instruction  register. 

Load  Memory  Address  Reqister  with  ad- 
dress of  uyte  containing  bit  to  be  set. 


’ teiiiiK  j : 7 >»  Fetch  ovte 

<U : / > 

Ji>>*  i Set  >ir 

•ILMAR]<J;7>»  Tte'ip  Store  byte 
<U:  /> 


M=  a bytes 


('lot;  Coat  sovorat  of  tba  steps  in  the  aoove  example  do  net  involve  transfers  from 
■ n.  mi'v  iPii  only  transfers  witain  tne  central  processor  reMst-^rs.  A mfiasore  for 
nr.iros 'or  t'edistor'  transfers  will  be  developeo  in  tne  nex-  section.) 

D.  Carry  Propagation.  Another  detail  tnat  may  cause  some  ambiquity  i^  to.'' 
calculation  of  tne  !i  measure  is  how  to  handle  carry  orooaiation.  For  example,  the 
PLV-11  IMC  instruction  increments  a memory  location  by  1.  How  much  should  oe  added 
to  tne  1 measure  to  represent  this  incrementation?  If  inc-ementinq  on  the  PL)P-11 
is  counted  as  a Ib-Oit  fetch  followed  by  a lb-bit  store,  then  this  will  usually  be 
an  overestimate  for  a POP-11  implemented  with  an  8-bit  memory  bus,  >lith  an  8-oit 
bus.  the  i'dP-11  would  fetch  8 bits,  increment  this  value,  store  the  now  oyte.  and 
onlv  It  the  carry  propagated  into  the  hinh  byte  would  the  '.econd  8 oits  need  to  be 
"lodi  tied. 

Ttie  problem  becomes  worse  when  we  consider  an  architecture  with  32-bit  pointers. 
It  an  increment  counts  as  a full  word  access  and  then  a full  word  store,  we  find  that 
incrementing  a 32-bit  numoer  is  much  more  expensive  than  a 16-bit  number,  yet  in  im- 
pl e"iencations  with  inei.iory  buses  of  lb  bits  or  less,  incrementing  a 32-bit  pointer 
will  almost  always  taxe  the  same  time  as  incrementing  a lb-bit  pointer. 

In  order  to  not  overestimate  processor/memory  traffic  with  the  M measure  and  to 
nrovf'pr  mip  calculation  of  the  M measure  from  becomino  overly  complex,  all  increments 
. ..iiied  to  reuuire  on  3-bit  store.  'onification  of  n qher  order  bytes  were  ig- 
nored since  they  occur  infrequently. 

(2)  Reqisters  Transfers  within  the  Processor 

The  processor/memory  traffic  measure  just  described  is  our  principle  measure 
of  a computer  architecture's  execution  rate  performance.  However,  it  should  not  be 
too  surprising  that  this  !T  measure  does  not  capture  all  we  might  want  to  know  about 
the  implementabi 1 i ty  of  an  architecture.  In  this  section  a second  measure  of  archi- 
tecture performance  is  defined:  R --  regi ster-to-regi ster  traffic  within  the  pro- 
cessor. Whereas  the  rl  measure  looks  at  the  data  traffic  between  primary  memory  and 
the  central  processor,  R is  a measure  of  the  data  traffic  internal  to  the  central 


processor.  Tne  tunoamental  goal  ot  the  M and  R measures  was  to  enable  tne  architec- 
ture selection  committee  to  construct  a processor  execution  rate  measure  from  H ana 
R (ultimately  an  additive  measure;  aM  + bR,  where  the  coefficients  a and  b can  be 
varied  to  model  projections  of  relative  primary  memory  and  processor  speeds).  An 
unfortunate  but  unavoidable  property  of  the  R measure  is  that  it  is  very  sensitive 
to  assumptions  about  the  register  ahd  bus  structure  internal  to  the  processor;  in 
other  words,  the  "implementation"  of  tne  processor. 

(a)  Definition  of  the  R measure 

The  definition  of  R is  based  on  the  idealized  internal  structure  for  a pro- 
cessor Shown  in  Figure  d-3.  By  using  the  register  structure  in  Figure  d-3  we  do 
not  imply  that  this  is  the  way  processors  ought  to  be  built.  On  the  contrary,  the 
structure  in  Figure  3-3  has  a much  more  regular  data  path  structure  than  would  be 
practical  in  contemporary  processors.  There  exist  both  data  paths  of  marginal  util- 
ity and  non-existent  data  paths  that,  if  present,  could  significantly  speed  up  the 
processor.  This  structure  was  selected  because  the  very  regular  data  path,  ALU,  and 
register  array  structure  helped  simplify  our  analysis. 

R(i,j,k)  is  defined  as  the  number  of  «-bit  bytes  that  are  read  to  and  written  from 
the  internal  processor  registers  during  execution  of  test  program  i,  as  written  by 
programmer  k,  on  architecture  j. 

The  Alu  in  Figure  3-3  is  allowed  to  perfonn  any  common  integer,  floating  point, 
or  decimal  arithmetic  operation;  increment  or  decrement;  and  perform  arbitrary  shift 
or  rotate  operations. 

(b)  Further  Detail  and  Clarification  of  the  R Measure  Definition 

The  definition  of  the  R measure  has  been  the  center  of  considerable  discussion 
witn  tne  CFA  Committee  (see  Section  3.4  for  chronology  of  development  of  S,  M,  and 
R).  Tne  following  discussion  and  examples  are  presented  here  to  fully  define  and 
clarify  tne  definition,  of  the  R measure. 

a.  Only  Data  Traffic  Measured 

All  data  traffic  is  measured  in  R and  no  control  traffic  is  measured. 

Figure  3-3  is  intended  to  specify  what  will  be  defined  to  be  control  traffic  and  what 
will  oe  data  traffic  for  the  purposes  of  the  R measure.  The  R measure  does  not  count 
tne  following  'control'  traffic: 

(1)  The  setting  of  tne  condition  codes  by  the  ALU  (or  control  unit)  and 
tne  use  of  the  condition  codes  by  the  control  unit.  The  only  time  that  movement  of 
data  into  or  out  of  the  Program  Status  Word  will  oe  counted  in  tne  R measure  is  when 
a Loaa  pSW  instruction  is  performed  or  a trap  or  interrupt  sequence  moves  a new  PSW 
into  or  out  of  the  PSW  register. 

(Z)  Bits  transmitted  by  the  control  unit  to  activate  or  otherwise  control 
the  register  file,  ALU,  or  memory  unit,  are  not  counted  in  the  R measure. 

(3)  Reading  of  the  Instruction  Register  by  the  control  unit  as  it  decodes 
tne  instruction  to  determine  the  instruction  execution  sequence  is  not  counted  in 
the  R measure.  In  other  words,  the  Instruction  Register  (with  the  exception  of  dis- 
placement fields)  will  be  for  most  practical  purposes  a write-only  register  as  far 
as  the  R measure  is  concerned. 
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Figure  3-3:  Canonical  Processor  Arcni Lecture 
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(4)  Loading  tne  Memory  Address  Register  is  counted  in  the  R measure.  Out 
use  OT  tne  contents  of  tne  Memory  Address  Register  to  specify  tne  address  of  data 
to  oe  accessed  in  primary  memory  is  not  counted. 

Among  data  traffic  tnat  was  a candidate  for  exclusion  from  the  R measure  was 
the  incrementation  of  tne  program  counter.  We  decided  to  leave  incrementation  on 
tne  RC  in  tne  R measure,  since  it  is  not  handled  via  special  circuitry  in  most  mini- 
computer implementations,  and  since  we  have  attempted  to  minimize  the  number  of 
special  "control  patns"  that  we  needed  to  define  in  our  canonical  CFA  processor 
structure. 

Tne  above  four  cases  of  control  information  excluded  from  the  R measure  is 
meant  to  oe  a total  enumeration  j(  the  control  flow  in  the  canonical  CFA  processor, 
because  of  these  control  paths,  three  registers  in  the  general  register  file  (i.e., 
tne  IR,  ilAR,  and  Rb)  are  specialized  in  their  function.  All  the  rest  of  the  regis- 
ters in  tne  register  file  are  simply  ordinary  registers  that  can  present  their  con- 
tents to  either  tne  A or  u input  of  tne  ALU  and  which  can  be  loaded  from  tne  output 
Of  the  ALu. 

0.  virtual  Address  Translation 

Tne  virtual  to  real  address  translation  process  is  not  counted  in  the  R 
Pleasure.  In  otner  words,  tne  final  memory  address  in  the  MAR  is  a virtual  address 
and  tne  worx  involved  in  translating  this  virtual  address  to  a real  address  is  not 
included  in  tne  R measure.  There  are  at  least  two  reasons  for  this  decision. 

(i)  All  practical  implementations  of  processors  with  a virtual  to  real 
translation  option  do  not  use  the  central  ALU  but  in  fact  use  special  data  paths  and 
adders  to  allow  tne  memory  translation  to  go  on  independent  of  tne  basic  operations 
in  tne  processor. 

U)  The  virtual  to  real  translation  process  will  not  be  used  by  some  mem- 
bers of  tne  CFA. 

It  Should  oe  noted  here  tnat  tne  R measure  wi 1 1 include  the  execution  of  load  and 
store  instructions  that  are  used  to  set  up  tne  control  registers  defining  the  real 
to  virtual  translation  process.  In  other  words,  the  control  registers  of  the  mem- 
ory translation  unit  are  being  treated  as  an  "extended  PSW"  with  respect  to  the  R 
measure. 

c.  ALU  Operations 

Assume  tne  ALU  in  tne  canonical  CFA  processor  can  directly  handle  all  tne 
data  types  of  the  arcni tecture:  signed  integer,  logical,  floating  point,  and  unsigned 
integer.  The  basic  reason  nere  is  tnat  when  performance  is  of  concern  the  ALU  of  the 
processor  is  capable  of  handling  the  appropriate  data  types.  The  alternative  would 
oe  to  assume  a simple  processor  (i.e.,  a two's  complement  adder)  and  then  emulate  the 
more  complex  functions  sucn  as  multiply,  floating  point  operations,  etc.  The  reasons 
for  going  for  tne  "full-function"  ALU  are  the  following; 

(i)  When  floating  point  execution  time  is  of  concern,  processors  are  used 
that  nave  floating  point  ALU's.  Floating  point  operations  are  emulated  on  simple 
two's  complement  adders  wnen  the  floating  point  function  is  needed  out  performance 
of  the  floating  point  is  not  essential. 
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{^)  Calculation  of  tne  R measure  for  floating  point  operations  assuming 
a two's  complement  adder  would  oe  tedious  and  given  the  architectures  under  consid- 
eration tnere  would  oe  little  difference  in  the  R measure  for  the  floating  point 
operations  of  the  candidate  architectures. 

d.  Intra-Instruction  R Optimization 

R measures  snould  oe  determined  for  the  general  case  of  an  instruction 
(witn  auxiliary  counts  to  reflect  different  addressing  modes).  For  example,  rec- 
ognizing that  the  instruction  SUbTKACT  R1,R1  is  really  a CLEAR  Rl,  and  then  not 
counting  tne  instruction  as  a (1)  Read  Rl , (Z)  Read  Rl,  (o)  Rrite  Rl,  but  only  (1) 
«rite  Rl  is  not  allowed. 

Even  worse  woulo  oe  to  treat  as  a special  case  the  instruction  AOU  R1,RZ 
Iwnere  result  is  stored  bacK  in  Rl)  when  RZ  = 0 and  define  the  R measure  to  count 
only  tne  reau  of  Rz  to  determine  it  is  zero.  Any  reasonable  implementation  of  <in 
instruction  snould  not  try  for  such  optimizations.  (Clearly,  similar  remarks  hold 
tor  .iiani pul ation  oy  zero  and  one,  shifting  by  zero,  etc.) 

e.  Inter-Instruction  R Optimization 

Tne  R measure  snould  not  be  optimized  for  a test  program  via  inter- 
instructiun  opti  iii  zati on.  For  example,  each  instruction  lust  fetcn  its  own  instruc- 
tion rrij.;i  priiiary  meiiiory.  Tne  idea  of  assuming  an  o-byte  or  lb-byte  IR  in  the 
processor  and  then  fetching  a new  block  of  instructions  only  wnen  the  current  clock 
IS  exhausted  is  not  allowed. 

Siiiiilarly,  tne  R measure  for  an  instruction  cannot  assume  a little  (or 
big!)  cache  in  the  temporary  registers  in  the  register  file  and  only  go  to  primary 
■iiemory  wnen  the  quantity  is  not  in  tne  "cache." 

t.  consistency  between  the  ri  and  R Measures 

inere  are  a number  of  tradeoffs  that  were  made  in  tne  determination  of  R 
and  rl.  .'lany  of  these  tradeoffs  were  prooaoly  acceptable  whichever  way  they  were 
decided,  but  care  had  to  oe  taken  to  be  consistent  when  counting  M and  R.  A good 
''xample  here  was  in  instruction  fetching.  The  RuR-11,  Interdata  a/dz  and  Ibil  S/o/u 
all  nave  z,  h,  and  a uyte  instructions.  If  they  nao  been  determined  oy  always 
fetcning  o oytes  and  thrbwing  away  tne  unused  oytes,  tnen  it  is  necessary  to  always 
count  tne  instruction  fetcn  as  o oytes  in  the  I'l  measure,  in  fact,  all  three  candi- 
date arcnitecture  R and  M measures  were  determined  by  assuming  that  tne  machines  will 
only  fetcn  the  first  two  oytes  of  an  instruction  and  then,  based  on  the  operation 
code,  decide  if  they  need  to  fetcn  any  more  bytes  from  memory.  To  ensure  comparabil- 
ity, one  individual,  Rilliam  Burr,  computed  tne  M and  R measure  for  eacn  instruction 
Of  al I tnree  final  candidate  architectures. 

g.  '-lemory  Buffer  Register 

NO  memory  buffer  register  is  specified  in  Figure  3-3,  out  any  of  the  gen- 
eral registers  in  the  register  file  can  act  as  a memory  buffer  register.  (In  other 
words,  tnere  is  no  need  for  any  special  control  lines  to  oe  attached  to  a MBR.) 

h.  The  Lost  of  Incrementation 

For  the  purpose  of  calculating  tne  R measure,  incrementation  need  only 
involve  tne  low  order  byte  of  the  register  involved.  For  example,  incrementing  the 
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hC  oy  i counts  as  a l-oyte  reaa  from  the  PC  and  1-Dyte  write  oack  Into  the  PC.  The 
times  when  the  nign  oroer  oytes  of  the  PC  need  to  be  involved  In  the  Incrementation 
process,  oecause  of  carry  propagation,  are  sufficiently  few  that  they  can  be  Ignored 
for  tne  purposes  of  the  R measure.  The  fact  that  Implementation  with  Ib-bit  buses 
will  route  the  low  order  lb  (or  J2)  bits  of  the  PC  to  the  ALU  for  Incrementation  does 
not  mean  the  R n.^sure  should  count  a full  PC  read  and  write  a full  PC  write.  (Uur 
earlier  measure  tf  R tor  lb,  32,  b4,  etc.  bus  Implementations  was  designed  to  model 
exactly  these  "inefficiencies"  In  wide  bus  implementations.)  Another  reason  for 
arguing  for  treating  Incrementation  as  less  costly  than  a full  word  add  Is  that  In 
higher  performance  machines  the  PC's  will  in  fact  be  implemented  as  a counter,  and 
tne  lower  R count  for  an  increment  reflects  the  simple  structure  of  an  incrementer 
versus  a full  adder.  However,  tne  caution  snould  be  repeated  here  that  the  R measure 
Dears  a stronger  relation  to  simple  rather  than  high  performance  implementations  of 
tne  architecture. 

i.  Constants 

Constants  of  +1  (carry),  -1  (borrow),  all  U's,  or,  all  I's  a^^e  assumed  to 
oe  "free"  (i.e.,  they  don't  require  access  to  a register  file  of  constants).  All 
other  constants,  including  2 and  4,  require  access  to  one  or  mere  bytes  of  the  constant 
register  file.  As  a result,  incrementing  by  1 costs  one  byte  less  than  incrementing 
by  2. 


j.  Shifts 

Although  Figure  3-3  does  not  specifically  shown  them,  It  is  assumed  that 
each  architecture  has  a single  register  with  appropriate  lines  into  the  control  unit, 
to  control  tne  shift  amount.  The  R measure  is  not  incremented  for  accessing  this 
register  to  determine  tne  shift  amount;  however,  it  is  incremented  when  that  register 
Is  loaded. 


K.  Hultiple  Assignment  Operations  not  Allowed 

In  an  earlier  definition  of  the  R measure,  we  allowed  operations  with 
multiple  assignments  suen  as; 

i''iAR,PC  Rl. 

In  otner  words,  MAR  and  PC  were  simultaneously  loaded  with  the  contents  of  Rl.  (For 
a 24-Dlt  MAR  and  PC  tnis  would  add  up  to  a total  R count  of  9 bytes.)  While  many 
processor  implementations  allow  such  a multiple  assignment  operation  (typically  mi- 
croprogrammed processors  termed  "norizontal"  microprocessors)  there  are  also  many 
implementations  that  do  not  allow  multiple  assignments.  After  extensive  discussions 
we  decided  that,  since  the  ISP  dialect  used  by  the  ISP  simulator  does  not  allow  mul- 
tiple assignments,  the  canonical  CFA  processor  would  not  allow  multiple  assignments 
in  order  to  simplify  the  collection  of  R measures. 

Hence,  the  above  example  must  now  oe  split  into  the  following  two  opera- 
tions (with  a total  R count  of  12  rather  than  9); 

PC  ^ Rl 

MAR  ^ Rl. 

1.  Registers  that  Must  be  Present  in  the  Register  File  of  the  CFA  Processor 
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Inere  jre  registers  in  me  cdnonicdl  CFA  processor  structure  tiidt  must  le 
in  dny  iinpleinentdtiori  of  tne  processor,  cledrly  dll  tfie  registers  in  the  processor 
state  of  tne  arcniiecture  .i«ist  oe  in  tne  register  tile,  since  their  state  must  per- 
sist across  instruction  eAi|cution  (e.g.,  general  purpose  register,  program  status, 
floating  point  registers).^ 

in.  itegisters  not  in  the  architecture  out  in  any  processor  structure 

(1)  Instruction  Register 

(<!)  Memory  Address  Register:  There  must  oe  some  distinguished  register 
tnat  can  present  an  address  to  primary  memory. 

(j)  Memory  outter  register  for  read-modi fy-wri te  operations  to  primary 

me.:iory . 

(4)  other  registers  to  hold  temporary  results  niay  oe  needed  oy  some  archi- 
tectures oecause  of  the  complex  effective  address  calculatiin  process  or  Decause  of 
sec,uentidl  complexities  in  the  execution  of  the  Dody  of  the  instruction. 

Inere  is  no  penalty  associated  witn  a proliferation  of  temporary  registers  in  the 
processor  as  far  as  the  K measure  is  concerned.  Ttie  R measure  counts  only  the  num- 
Der  of  reads  and  writes  to  a register,  not  the  number  of  registers  needed  to  imple- 
ment an  architecture.  Hence,  tnere  is  a natural  tendency  in  laying  out  the  register 
ti  le  tor  dll  arciii tecture  to  define  as  many  temporaries  as  tnere  are  types  of  tempo- 
rary 1 nror.iidci  on. 

I lie  proposal  was  made  oy  several  CFA  inemoers  that  we  limit  the  R measure  to  tne 
registers  in  tne  archi tecture  of  the  computer.  Specifically,  this  would  mean  tnat 
tne  R iiieasure  would  not  count  reads  and  writes  to  the  IR,  MaR,  memory  buffer  register, 
and  any  otner  temporary  registers  in  tne  register  file,  but  not  in  tne  architecture. 

This  proposal  nas  tne  attractive  property  that  it  makes  the  -i  measure  somewhat  easier 
to  define  (and  just  as  importantly  makes  the  R measure  easier  to  measure  automatically 
on  tne  !SR  simulator).  Tne  problem  with  this  proposal  is  that  somewhere  arouno  1/d  of 
tne  present  R traffic  is  connected  to  tne  nonarchitecture  re  listers  in  the  canonical  CFA 
processor,  pince  tne  canonical  CFA  processor  is  meant  to  moiel  an  implementation  of 
tne  processor,  an  R measure  restricted  to  reads  and  writes  or  bytes  in  the  registers 
in  tne  arcniiecture  misses  doout  1/d  of  the  processor's  inte-nal  activity. 

n.  i.ounting  byte  Reads  anu  writes  Ratner  tnan  ous  Cycles 

Tnere  was  considerable  discussion  in  the  CFA  committee  as  to  whether  tne 
R measure  should  count  the  numoer  of  (mi crolcycles  of  the  canonical  CFA  processor 
or  tne  number  of  bytes  read  and  written  from  tne  register  file.  The  decision  to 
count  byte  reads  and  writes  was  made  for  the  following  reasons; 

li)  The  numoer  of  bus  cycles  to  execute  an  instru:tion  is  more  sensitive 
mo  a particular  processor  structure  than  tne  number  of  bytes  that  are  read  and  writ- 
ten. Since  we  would  like  tne  R measure  to  be  as  "robust"  a neasure  of  architecture 
imp leiiientaoi  1 i ty  we  have  chosen  tne  measure  less  sensitive  to  the  internal  processor 
bus  structure. 
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^There  exist  implementations  where  tne  RC  state  is  kept  in  primary  menwy  not  in  a 
register  file  in  the  processor,  e.g.,  the  TI'#guu  and  the  IBii  dou/d/o.  In  these  low- 
speed  implementations  much  of  tne  R measure  must  oe  transferred  to  the  M measure  to 
estimate  the  true  performance.  -lo 


i 


(2)  The  practical  difference  between  byte  counts  and  cycle  counts  is  that 


cne  processor  structure  allows  two  or  three  byte  counts  (e.g.,  two  if  X •<-  A and  three 
’f  < - t+3)  per  cycle  count.  The  issue  boils  down  to  whether  we  think  it  is  reason- 
jji-j  t.:;  differentiate  between  these  different  types  of  bus  cycles.  We  argue  that  we 
should  differentiate  since  a three-byte  cycle  is  more  "complex"  in  either  time  or  to 
hardware  than  is  a two-byte  cycle.  Three-byte  cycles  usually  involve  an  ALU  opera- 
tion, i.e.,  extra  time  and  three-byte  cycles  make  use  of  the  general  structure  of  the 
canonical  CFA  processor  --  simple  structures  such  as  the  PDP-11/2U  or  PDP-11/4U  re- 
quire two  two-oyte  cycles  to  get  done  what  the  canonical  CFA  processor  gets  done  in  one 
three- byte  cycle. 

0.  Role  of  ISP  Simulator  in  M and  R Measurement 

doth  the  H and  the  R measures  were  measured  indirectly  by  first  measuring 
the  number  of  instructions  executed  of  each  instruction  type  and  addressing  mode. 

The  number  of  executions  of  each  instruction  type  and  addressing  mode  were  deter- 
mined (in  most  cases  automatically  via  the  ARF).  The  appropriate  M and  R values  for 
each  instruction  and  addressing  mode  were  then  determined  from  the  worksheets  given 
in  Appendix  D,  multiplied  oy  the  appropriate  factor,  and  totaled  for  each  program. 
Although  more  direct  procedures  were  considered  for  measuring  M and  R on  the  ISP  simu- 
lator, they  were  rejected  within  the  context  of  the  time  constraints  of  the  CFA  com- 
mittee because: 

(1)  It  was  iripossible  to  run  some  of  the  test  programs  on  the  ARF  (e.g., 
floating  ooint  instructions  were  not  implemented  in  any  of  the  ISP  descriptions). 

(2)  'lore  direct  ;neasurements  were  subject  to  considerable  variation  due 
to  tne  details  of  tne  ISP  programs  which  describe  the  candidate  architectures.  (See 
the  appendices  of  Volume  IV  for  the  ISP  descriptions  of  the  three  final  architec- 
tures.) This  variation  was  substantial  because  the  ISP's  were  created  by  different 
individuals,  and  because  tne  primary  constraint  on  the  description  was  descriptive 
clarity,  and  not  efficiency  (in  the  sense  of  accessing  registers  the  minimum  required 
number  of  times). 

(c)  R Measure  Calculation. 

To  simplify  the  calculation  of  the  R measures,  R values  for  all  instructions 
were  considered  to  nave  the  form: 

R = R^  + R + R (3.1) 

f a op 

When  R depends  upon  the  instruction  format,  R is  deoendent  upon  the  addressing 

mode  of  the  instruction,  and  R is  determined^by  the  operation  performed. 

op 

The  calculation  of  R measures  is  illustrated  in  Figure  3-4,  which  shows  how  the 
R count  is  determined  for  an  IBM  S/370  RX  format  instruction: 

A RY,X(R2,R7). 

RX  instructions  are  considered  to  nave  four  distinct  addressing  modes,  depending 
upon  whether  the  base  or  index  registers  specified  are  zero  or  nonzero.  Using  the 
ARF,  and  inserting  suitable  "hooks"  in  the  ISP  descriptions,  it  was  easy  to  count 
the  number  of  address  calculations  of  each  type,  the  number  of  executions  of  each 
particular  instruction  format  (i.e.,  RR,  RX,  SS,  etc.),  and  the  number  of  executions 
of  each  operation. 
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RX,  RS.  & SI  INSTRUCTION  INTERPRETATION 


1 


R 

COMMENT 

IR<0:15>  ^ MhlMAR] 

2 

Get 

hi  fud  in  instr  reg 

MAR  <-  MAR  + 2 

3 

I nc 

counts  only  1 byte 

1R<15:31>  ^ MhlMAR] 

2 

Get 

rest  of  instr  in  IR 

PC  ^ PC  + 4 

3 

I nc 

Prog  Counter 

address  interpretation 

- 

instruction  execution 

- 

MAR  »-  PC 

6 

Set 

up  MAR  for  next  instr. 

TOTAL 

IS 

RX 

ADDRESS  CALCULATION 

1. 

B2 

= 0,  X2  = 0 

R 

COMMENT 

MAR  - IR<20:31> 

5 

Just  12  bi ts 

from  IR 

B2 

= 0,  X2  > 0 

MAR  1R<20:31>  + 

12  bits  plus 

24  from  reg 

R[x2]<8:31> 

8 

3. 

B2  > 

0.  X2  = 0 

MAR  V-  IR<20:31>  -t- 

RIB2] <8;31> 

8 

4. 

B2 

> 0.  X2  > 0 

MAR  ^ IR<20:31>  + 

R[B2]<8:31> 

8 

MAR  ^ Rlx2]  + MAR 

9 

12  bit  d i sp 1 

plus  two  regs 

TOTAL 

17 

t 

! 


EXAHPLE  INSTRUCTION 


A R4.DISP(R2.R7)  RX  ADD  INSTR 

R 


RX  instruction  interpretation  IS 
address  interpretation  17 
nSR  - Mm  [MAR]  4 
RIRl]  RIRl]  + NBR  12 

TOTAL  49 


Figure  3-4.  IBM  S/370  R treasure  Examole 
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best  available  copy 

Figure  3-5  shows  a similar  instruction  for  the  Interoata  b/32: 

A R1,D2(X2) 

Although  the  addressing  modes  for  the  Interdata  8/32  are  somewhat  different  ♦ror 
the  IBM  S/370,  the  example  chosen  also  adds  two  registers  (the  HC  and  X2)  plus  a 
displacement  to  generate  the  effective  address,  and  the  oasic  structure  of  the 
machines  are  quite  similar.  Consequently,  the  corresponding  add  instruction  gen- 
erates the  same  R count  in  both  machines. 

rt'hile  calculation  of  the  R measures  for  the  Interdata  8/32  and  IBM  S/370  is 
stra i gntforwaro  in  most  cases,  this  is  not  true  for  the  PDP-11.  This  is  because  the 
IBM  S/37U  and  Interdata  8/32  have  a rich  set  of  operation  codes,  which  determine  the 
precise  instruction  format,  and  which  limit  the  addressing  modes  to  a few  similar 
possibilities.  The  destination  operand  of  an  IBM  S/370  add  instruction  is  always  a 
register,  and  there  are  only  4 possible  (and  very  similar)  source  operand  addressing 
modes.  Moreover,  the  source  operand  is  always  in  memory.  The  POP-ll  has  far  fewer 
operation  codes,  but  a rich  set  of  addressing  modes.  In  general,  either  the  source 
or  destination  operand  may  be  a register  or  a memory  location,  which  is  addressed  in 
one  of  7 ways.  That  means  that  each  two  operand  instruction  has  a total  of  64  combi- 
nations of  addressing  modes.  In  addition,  there  are  intenactions  between  the  opera- 
tions and  addressing  modes,  which  affect  the  R measure.  For  example,  it  makes  a 
difference  in  the  R of  a M07  instruction  if  the  destination  of  the  MOV  is  a register 
or  a memory  locatioR?  It  also  makes  a difference  if  the  destination  cf  an  ADD  is  a 
register  or  a memory  location.  Unfortunately,  the  effects  are  different  for  the  ADO 
and  the  MOV,  because  the  ADD  fetches  and  stores  in  the  destination,  while  the  MOV  only 
stores  in  the  destination. 

The  optimum  R measure  for  each  of  12x54  combinations  of  double  operand  instuc- 

tion  and  addressing  modes  could  in  principle  be  computed,  as  well  as  the  R measure 

for  each  of  46x8  combinations  of  floating  point  instructions  and  addressing  modes, 

and  roughlv  30x8  combinations  of  single  operand  instructions  ano  addressing  modes. 

This  would  have  been  very  tedious,  and  would  have  corresponded  to  an  implementation 
which  directly  decoded  the  combination  of  the  OP  code,  source  mode,  and  destination 
mode,  rather  than  just  the  OP  code,  and  then  implemented  individual  microcoded  rou- 
tines for  each  case. 

Of  course,  such  an  implementation  is  possible,  as  are  various  compromises  which 
would  directly  decode  special  cases  of  interest,  but  not  every  possible  case.  These 
cases  would  not  correspond  to  the  kind  of  simple,  regular  implementation  we  have 
assumed  in  our  development  of  the  R measure.  The  PDP-11  R measure  was  calculated 
assuming  that  only  the  operation  code  was  decoded  by  the  control  unit.  Moreover, 
the  remaining  instruction  decoding,  which  is  assumed  to  be  performed  in  microcode, 
is  done  in  a very  regular  and  strai ghtforwaro  way.  Obvious  interaction:-  of  address- 
ing mode  and  operation  were  accounted  for  within  this  regular  structure,  but  no  at- 
tempt was  made  to  optimize  every  special  case,  or  to  recognize  and  optimize  important 
special  cases. 

Figure  3-b  illustrates  how  the  PDP-11  R measures  were  conputeo.  Simply  fetching 
the  first  word  of  the  instruction  and  incrementing  the  PC  generates  an  T measure  of 
9.  A mode  6 address  calculation  requires  an  R count  of  15.  Performing  the  ADD  it- 
self takes  an  R count  of  8,  resulting  in  a total  R count  of  32.  Finure  3-7  illus- 
trates how  this  same  instruction  might  have  been  optimized  on  a case  by  case  basis. 

The  result  now  is  R=28,  which  is  a savings  of  12. 5i,  in  R. 
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RXl  AND  RK2  INSTRUCTION  INTERPRETATION 


IR<0;  1S>  1-  nhlHAR] 

HAR  - NAR  + 2 
IR<16:31>  - HhlMAR] 

PC  - PC  + A 
aaciress  interpretation 
instruction  execution 
HAR  - PC 

TOTAL 


R 

2 Get  halfwd  in  instr  reg 

3 Inc.  counts  only  1 byte 

2 Get  rest  of  instruction 

3 Inc  prog  counter 


6 Prepar  flAR  for  next  instr 

16 


RX2  EFFECTIVE  ADDRESS  CALCULATION 

1.  X2  = 0 R COnilENT 


MAR  ^ IR<18:31>  + PC  8 

2.  X2  > 0 

HAR  ^ IR--17:31>  •<-  PC  8 

MAR  MAR  + R(X2]<8:31>  9 

TOTAL  17 


15  bit  disp  plus  prog  ctr 

15  bit  disp  plus  prog  ctr 
add  3 bytes  from  reg 


EXAnPLE  INSTRUCTION 


A R1.D2(X2)  RX2  ADD  INSTRUCTION 

R 


RX2  instruction  interpretation  16 
address  interpretation  17 
MBR  - tlwIMAR]  4 
R[R1]  ^ R[R1]  + MBR  12 

TOTAL  49 


Figure  3-5.  Interdata  8/32  R Measure  Example 


HAR  •-PC  4 load  MAR  for  instr  fetch 

IR  ♦-  riw[MAR]  2 Load  instr  reg 

PC  •-  PC  + 2 3 Inc  prog  ctr 

instruction  execution  - instr  type  dependent 

TOTAL  9 


t 

! 


( 


BINARY  OPERATION.  S > 0.  0 - 0 

R 


riAR  - PC  4 

IR  - Mu (MAR]  2 

PC  ♦-  PC  + 2 3 

effective  addr.  calculation 
instruction  execution 

- total  3 

ADDRESS  CALCULATIONS 
1 . MODE  2 

MAR  - RIs/d]  4 

R(s/d]  - R(8/d]  +2  3 

TOTAL  7 


COMMENT 

Load  MAR  for  instr  fetch 
Load  instr  reg 
Inc  PC 

Mode  dependent,  addr  in  MAR 
instr  type  dependent 


Get  addr  from  reg 

This  is  2 for  byte  operations 


2.  MODE  G ' 

MAR  <-  PC  4 Set  up  MAR  to  read  addr 

PC  •-  PC  + 2 3 Inc  PC  for  next  instr  fetch 

MAR  •-  Mu  [MAR]  2 Get  addr  in  MAR,  and 

MAR  •-  MAR  + R[s/d]  G add  index 

TOTAL  IG 


EXAMPLE  INSTRUCTION 

ADO  X(%2).%1  jAOD  INSTRUCTION.  S - G.  0 - 0 


R 

Binary  ins  interpretation,  s > 0,  d • 0 3 
Source  Addr  calculation,  mode  G 15 

MBR  Mu  [MAR]  2 

RIsl  - R[8]  + MBR  8 

TOTAL  32 


Figure  3-6.  PDP-11  R Measure  Examole 
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EXAMPLE  INSTRUCTION 


ADD 


R 

MAR  ' PC  4 

IR<0;15-  - MwlMARI  2 

MAR  MAR  + 2 3 

IR  16:31-  . MwfMARl  2 

MAR  <-  IR<16:31>  + R[21  6 

MBR  < Mw[MARl  2 

R[1 I « R[1 1 + MBR  6 

PC  - PC  + 4 3 


28 


ADD  Instruction,  S=6,  D=0 
COMMENT 

Load  MAR  for  instr.  fetch 
Load  instr.  reg. 

Bump  MAR  to  get 
rest  of  instruction 
Compute  source  address 
Get  source  operand 
Add  source  and  dest 
Increment  PC 


Figure  3-7.  Example  of  Case  by  Case  R Measure 
Optimization  for  the  PDP-11 


It  can  certainly  oe  argued  that.  Because  the  architectures  of  the  IBM  S/37u  and 
tie  Interdata  6/ii  fit  more  naturally  into  the  model  of  equation  3-1,  and  because 
tiey  get  to  decode  an  b bit  operation  code  which  includes  some  information  carried 
in  the  mode  fields  of  the  PUP-ll,  that  the  model  chosen  is  somewhat  biased  against 
the  POP-li. 

A number  of  other  minor  specifications  were  also  made  uniformly  to  the  R mea- 
sure calculations  of  the  3 architectures.  These  involved  instructions  which  do 
different  things  in  different  cases.  For  example,  there  should  properly  be  a dis- 
tinction made  between  conditional  branch  instructions  which  do  branch,  and  those 
wnich  do  not.  To  simplify  the  collection  of  data,  conditional  branches  and  branch 
and  count  type  instructions  were  always  assumed  to  branch. 

(d)  Range  of  Applicability  of  the  R Measure 

Ideal  execution-rate  measures  would  consider  the  execution  speed  of  test  pro- 
grams across  the  entire  space  of  feasible  processor  implementations  and  then  weight 
the  results  from  each  implementation  by  the  number  of  implementations  of  that  type 
projected  for  the  CFA  architecture.  Since  we  have  neither  the  time  nor  the  foresight 
to  collect  execution  rate  measures  over  all  possible  implementations,  an  effort  has 
been  made  to  define  a simple  yet  representative  processor  implementation  structure. 
Early  atter.ipts  at  defining  an  R measure  included  a family  of  implementations  (based 
on  internal  Bud  width)  [Fuller,  et  al , iy7b],  but  for  practical  reasons  the  committee 
oacked  off  to  a single  R measure. 

Figure  3-a  should  help  to  give  the  rationale  oehind  why  R is  defined  as  it  is. 
For  any  architecture  there  is  a cost-performance  tradeoff  to  make  and  in  fact  it  is 
because  of  tnis  fundamental  tradeoff  that  it  is  so  attractive  to  consider  a family 
of  implementations  of  a single  architecture.  This  allows  users  of  computer  systems 
to  pick  a processor  with  the  cost-performance  characteristics  that  best  match  the 
intended  applications. 

The  oasis  of  the  present  R measure  is  to  specify  as  simple  a processor  imple- 
mentation as  possible.  If  we  do  not  apply  some  knowledge  of  actual  implementations, 
we  could  continue  to  simplify  the  processor's  internal  register  structure  until  we 
are  down  to  a universal  Turing  machine.  An  R measure  based  on  a Turing  machine  would 
certainly  be  in  some  sense  fundamental,  not  be  Biased  toward  any  of  the  candidate 
arcnitectures,  and  well  defined.  However,  there  might  be  serious  doubt  as  to  the 
utility  of  an  R measure  based  on  a Turing  Machine  since  no  one  has  been  able  to  dem- 
onstrate that  tnere  exists  any  significant  correlation  between  Turing  machine  perfor- 
mance and  the  performance  of  practical  computing  machines. 

Therefore,  we  have  little  choice  but  to  reject  the  Turing  machine  approach  and 
define  a more  practical  (and  more  complex)  processor  structure.  This  is  what  we  have 
done  with  the  current  R measure.  Keep  in  mind  there  are  undouotedly  many  "simple" 
implementation  structures  on  whicn  it  would  be  appropriate  to  base  the  R measure  in 
addition  to  the  specific  one  selected.  However,  the  one  selected  is  meant  to  be  a 
moderately  simplified  model  of  tne  internal  register  structure  of  contemporary  mini- 
computer processors  and  microcomputer  processors  based  on  bit-slice  LSI  packages. 

A shortcoming  of  the  present  R measure  (as  opposed  to  the  set  of  R measures 
originally  proposed)  is  that  it  is  a questionable  indicator  of  high  performance  im- 
plementations. Figure  3-d  attempts  to  illustrate  this.  Assuming  R is  a good  indi- 
cator of  performance  for  the  register  structure  shown  in  Figure  3-3,  it  is  probably 
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Tourinq  Canonical 

Machine  CFA 


Processor 

CObT 

* The  assumption  underlining  the  use  of  a scolar  R measure  is  that  the 
candidate  architecture  will  behave  similarly  to  Architectures  A,  B,  and 
C above,  and  not  like  D. 


Figure  3-a:  The  Cost-Perfor{ndnce  Space  of  Processor  Ii'tolementation 
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ilsj  J ;oc  1 inlicdtor  or  urocessnr  i-:ip1eiientdtions  "close"  to  this  canonical  struc- 
ture on  toe  cost-perforinance  cu'^ve.  However,  as  we  consider  hign  performance  imple- 
mentation, instruction  buffers,  caches,  pipelining,  forwaro  cancelling,  etc.  begin 
to  play  an  important  role  in  processor  performance.  The  present  R measure  is  of 
ojesti onable  utility  in  predicting  the  performance  of  architectures  on  these  high 
performance  implementations.  Given  that  much  of  the  complexity  of  high  performance 
inplenentations  is  designed  to  maximize  the  rate  at  which  the  processor  can  fetch 
new  instructions  and  data  from  memory,  probably  the  best  practical  indicator  the 
committee  can  use  for  the  relative  performance  of  very  high  performance  processors 
would  oe  simply  the  M measure. 


C ^'ROCEDURES  FUR  COLLECTING  DATA  ANO  COMPUTIMG  THE  S,  H,  AND  R MEASURES 


from  the  beginning  we  realized  it  would  not  be  realistic  to  give  every  pro- 
grammer the  definition  of  S,  ”,  and  R and  ask  him  to  calculate  these  measures  for 
his  own  test  programs.  An  important  simplification  in  the  computation  of  the  li 
and  R measures  was  obtained  by  tabulating  the  measures  for  each  instruction  (and 
addressing  mode)  for  the  three  architectures.  The  M and  R measures  for  each  in- 
struction were  then  tabulated  in  the  form  of  a "worksheet."  Copies  of  these  work- 
sheets are  given  in  Appendix  U. 

Given  the  worksheets,  the  coiaputation  of  M and  R reduces  to  counting  the  num- 
ber of  tines  each  instruction  and  addressing  mode  is  executed.  Two  alternative 
•’rccedures  existed  for  this  phase  of  tne  computation.  An  ISP  simulator  was  avail- 
able (described  in  Volume  IV)  that  took  an  ISP  description  of  the  computer  archi- 
tecture. the  machine  language  representation  of  the  test  program  to  be  executed, 
and  si  'ulated  the  execution  of  the  test  program  on  the  machine  described  in  ISP. 

'.'lie  I Sr-  simulator  has  been  instrumented  to  accumulate  the  number  of  times  each  in- 
struction and  addressing  mode  is  executed  and  hence  provides  exactly  the  informa- 
tion ••edeo  for  computing  the  M and  R measures.  Tne  majority  of  test  programs  were 
.leasu'^ec  using  tne  ISr  simulator.  Unfortunately,  there  were  a few  test  programs 
tnat  were  not  easily  measured  using  the  ISP  simulator  and  these  had  to  be  done  by 
band.  Tiie  reasons  for  measuring  some  of  the  test  programs  by  hand  were:  (1)  within 
the  tine  available  we  could  not  get  a working  version  of  the  test  program  in  a 
machi  ne-readable  format  tnat  we  could  input  to  the  ISP  simulator  on  the  CHU  PDP-l'J, 
or  (2)  tne  test  program  involved  too  much  I/O  activity  and  the  ISP  simulator  was 
not  constructed  to  nandle  I/O  operations  automatically.  (However,  in  the  final 
analysis  it  would  probably  have  been  faster  and  less  error-prone  to  single-step 
the  ISp  simulator  through  I/O  operations  interactively  and  then  get  instruction 
counts  from  the  ISP  simulator  than  to  do  it  manually.) 

Tne  final  step  in  the  S,  M.  and  R computations  was  handled  by  another  program 
that  perforned  the  additions  and  multiplications  indicated  on  the  architecture  work- 
sheets. Tnis  program  could  either  read  inst-^uction  count  information  directly  from 
f:;  output  tiles  generated  uy  tne  ISP  simulator  or  request  the  manual  entry  of  in- 
struction counts  for  those  test  programs  neasured  by  nano.  In  addition  to  el  ini  na- 
tion many  tedious  comoutations,  this  orogram  eliminated  many  errors  that  would  have 
Croat  into  the  measures  if  this  final  tallyino  process  had  been  done  by  hand. 


-or  the  specific  set  of  1U5  test  programs  written  and  measured  in  the  CFA 
project,  there  is  a very  strong  positive  correlation  between  all  the  M and  R 
measures.  No  decision  made  oy  the  CFA  committee  would  have  changed  if  just 
the  M measure,  rather  than  a weighted  sum  of  the  M and  R measures,  had  been 
lised. 
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Anotner  feature  of  automating  the  final  step  of  computing  the  S,  M,  and  R mea- 
sures is  that  it  modifications  to  the  S,  M,  and  R measures  are  required,  it  is  now 
possible  to  cnange  just  the  measures  in  the  program  and  then  rerun  the  S,  M,  and  R 
computations  for  all  the  test  programs.  Investigating  the  effect  on  the  R measure 
of  "micrucoded"  floating  point  operations  could  also  be  pursued  this  way. 

d.  CHKOnOLOoY  of  UEVtLURMENT  UF  S,  M,  AnU  R MEASURES 

oome  of  the  reasons  for  the  particular  form  the  S,  M,  and  R measures  have 
taken  are  due  to  tne  time  constraints  ano  tne  sequence  of  development  of  these 
ineasures  over  a period  of  tine  ranging  from  early  October,  1973  through  July,  ly7o. 
The  following  account  traces  the  development  of  these  measures  and  attempts  to  in- 
dicate Where  iiiiportant  decisions  and  changes  were  made  in  the  definition  of  these 
architecture  ineasures. 

An  initial  nemo  LFuller  and  Smith,  i97bj  discussing  selection  criteria  was  dis- 
triouteu  at  the  initial  CFA  meeting  on  1 and  l October  1973.  This  initial  nemo  was 
tne  result  of  discussions  over  tne  course  of  tne  summer  of  1973  between  S.  Fuller, 

J.  Parnas,  J.  Snore,  L).  Siewiorek  and  w.  Smith.  Following  discussion  at  the  1 ano 
^ October  .iieeting  a Selection  Criteria  subcommittee  was  formed  to  further  develop 
specific  criteria  the  CFA  committee  could  use  in  selecting  a computer  architecture 
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rnc  Selection  Criteria  subcommittee  met  on  Id  and  29  uctooer  1973  and  the 
results  of  our  discussion  were  reported  to  the  CFA  committee  at  the  i and  4 oecein- 
oor  ia/3  meeting  [Fuller  et  al..  i973j.  In  that  report  the  original  versions  of  the 
s,  11,  ind  4 .'leasures  were  presented.  In  the  original  definition  of  tne  m and  R ..lei- 
sures, they  were  in  fact  a set  of  measures,  parameterized  by  the  width  of  tne  bus. 

In  other  words,  rattier  than  just  ri,  and  rC  , i-I^,  were  proposed  where  tne 

Subscript  was  the  width  of  the  bus  between  memory  and  the  processor.  The  present 
defi.nition  of  .•!  is  in  fact  m..  The  other  ous  widths  were  dropped  because  the  vari- 
ation or  ous  width  added  a dTiiiension  of  complexity  in  the  selection  process  that  was 
felt  to  oe  of  liiiiiteu  utility  to  the  purposes  of  the  CFA  committee.  Following  tne 
ueceiiiber  1^73  iFA  meeting  a revised  definition  of  the  S,  M,  and  K measures  data  was 
written  and  distributed  to  tne  CFA  committee  on  lb  January  ly/o  [ourr,  in/o].  Inis 
report  still  left  a numoer  of  troublesome  loose  ends,  particularly  witn  respect  to 
the  R measure,  and  finally  a memo  on  24  May  iR/o  was  written  to  clarify  these  netini- 
tions  ‘ .1  3r,  i'-j/oj.  l■lany  people  contributed  ideas  and  suggestions  to  tne  iinai  i--i- 
nition  or  tne  o,  m,  ano  R in  addition  to  the  original  Selection  Criteria  committee. 

Ine  .nost  active  contributors  were  w.  Burr,  d.  Siewiorek,  w.  Baroacci , R.  ooraon,  and 
w.  bnnth.  For  tne  purposes  of  tne  CFA  committee,  tne  working  definition  of  tn^  S,  m, 
and  R .iieasures  are  given  in  very  concrete  terms  in  Appendix  U wnere  these  measures 
are  specifically  defined  for  each  instruction,  addressing  mode,  and  instruction  for- 
mat for  eacn  of  tne  tnree  final  candidate  architectures.  As  should  be  clear  from  the 
definition  of  the  R .measure,  and  to  some  extent  the  M measure,  there  is  a limited  but 
nonnegligible  degree  of  freedom  in  applying  these  measures  to  a specific  architecture. 
Care  was  taken  to  be  as  consistent  as  possible  in  calculating  the  measures  for  the 
Ibm  S/j7o,  PJP-ll,  ana  Interdata  B/J2  and  a single  individual,  W.  Burr,  examined  (and 
aid  much  of  the  computation)  of  the  M and  R measures  for  all  tnree  candidate 
architectures. 


28 


I 

f 

i 


Both  the  2U  November  1975  and  15  January  19/6  reports  on  the  S,  M,  and  R mea- 
sures proposed  a relative  weighting  technique  similar  to  the  method  used  with  the 
quantitative  criteria  (see  Volume  II).  However,  at  the  third  CFA  meeting  on  18-20 
February  1976  this  weighting  scheme  was  rejected  and  replaced  with  the  life-cycle 
cost  analysis  described  in  Volume  VI.  The  S,  M,  and  R values  are  now  input  as  co- 
efficients to  terms  estimating  the  memory  size  and  processor  speed  of  future  tacti- 
cal computers. 
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■+.  STAriSriCAL  uEblbrt  UF  rtST  HRUbSAi^  ASSIIjNI'IEnTS* 

Selection  of  statistical  experiinental  designs  to  compare  the  Candioate  com.ioter 
architectures  involved  several  major  considerations.  These  are  common  to  all  situa- 
tions in  wnich  a statistical  design  plan  must  oe  chosen. 

At  an  immediate  practical  level,  an  experimental  design  selected  must  oe  execu- 
taole  with  the  availaole  resources,  including  time,  funds,  manpower,  and  available 
haroware  and  software.  Moreover,  the  design  must  lead  to  experimental  results  which 
allow  assessment  of  the  main  issues  under  study.  Tnat  is,  there  must  be  a direct  con- 
nection between  tne  questions  wnich  can  oe  answered  from  the  data  gathered  and  the 
questions  formulated  at  the  outset  of  the  study.  Finally,  the  experimehtal  results 
must  be  capable  of  being  analyzed  by  clearly  understood  statistical  methods  which  do 
allow  the  assessment  of  tne  main  issues  of  the  study.  These  consideratiohs  influenced 
strongly  the  oroad  outlines  of  the  designs  chosen  for  comparison  of  the  candioate 
arcni tec tures. 

Tne  test  program  phase  of  the  computer  family  architecture  evaluatioh  process 
involved  comparison  of  twelve  test  programs  on  three  machines.  Approximately  six- 
teen programmers  were  availaole  for  the  study  and  a complete  factorial  design  would 
have  required  each  programmer  to  write  all  of  the  test  programs  on  each  of  the  ma- 
chines (tor  a total  of  o/o  programs).  Tnis  was  clearly  not  feasible  with  the  given 
time  and  resource  constraints,  ano,  consequently,  a fractional  design  or  several 
tractionai  aesigns  naa  to  oe  selected.  Fractional  factorial  designs  are  discussed 
oy  Loavies,  197iJ,  e.g.  The  fractiohal  designs  to  oe  described  oelow  incorporate 
udiance  in  the  way  test  program,  machine,  and  programmer  combinations  are  assignee. 

It  was  necessary  to  consider  designs  which  required  each  programmer  to  write 
test  programs  for  all  three  machines.  Otherwise,  comparisons  among  the  machines 
coulo  not  oe  separatea  from  comparisons  among  the  programmers.  A desirable  design 
woula  nave  instructea  each  programmer  to  write  a total  of  six  or  nine  different  test 
progra.ds,  one  tniru  of  tnem  on  each  of  tne  three  machines.  For  most  of  the  prograni- 
ners  in  tne  study  time  limitations  precluded  this  type  of  design,  and  some  compromise 
was  required.  The  compromise  design  selected  also  nad  to  allow  for  precise  compari- 
sons among  tne  tnree  competing  architectures.  A type  of  design  that  meets  both  of 
these  Objectives  is  the  nested  factorial  LAnderson  and  McLean,  1974]. 

ihe  test  program  part  of  the  study  actually  involved  the  use  of  tnree  separate 
experii.ientdl  designs,  henceforth  referred  to  as  Phase  I,  Phase  II,  and  Phase  III. 
westeo  factorial  designs  were  used  for  Pnase  I and  Phase  III;  Phase  II  did  not  use 
a nested  factorial  design  and  will  oe  discussed  later  in  this  section.  Phase  I was 
used  to  study  test  programs  A through  H,  those  deemed  to  be  of  primary  interest. 

Pnase  III  was  used  to  study  test  programs  I tnrough  L. 


*Sections  4 and  b describe  tne  statistical  design  and  analysis  of  the  test 
program  assignments  and  resulting  data.  Those  readers  who  do  not  want  to 
study  the  statistical  considerations  of  the  test  program  study  may  skip 
ahead  to  Section  b to  get  a summary  of  the  principal  results  of  the  analy- 
sis described  in  Sections  4 and  b. 
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The  Phase  I design  is  a pair  of  nested  factorials.  The  plan  of  the  design  is 
shown  in  Figure  4-1,  Each  programmer  thus  writes  each  of  two  assigned  test  programs 
on  all  three  machines.  Moreover,  each  of  the  test  programs  is  assigned  to  two  dif- 
ferent programmers.  When  this  design  was  originally  fonnulated,  the  plan  included 
requiring  each  programner  to  write  his  six  test  programs  in  a preassigned  randomly 
selected  order,  so  as  to  eliminate  possible  biases  due  to  learning  and  gaining  of 
experience  during  the  course  of  completing  the  assignments.  This  procedure  was  dis- 
carded, however,  when  the  programmers  objected  because  of  the  varying  availability 
of  the  three  machines  for  debugging.  Programmers  were  instructed  to  complete  the 
assigned  jobs  in  conformity  with  their  typical  practices  and  working  habits  with 
regard  to  order,  consultation  with  other  individuals,  and  other  such  considerations. 
Programmers  in  the  study  were  not  permitted  to  consult  with  each  other,  however,  on 
any  substantive  matters  of  completing  their  designated  assignments.  All  programmers 
were  instructed  to  keep  diaries  of  their  work  on  the  experiment. 

As  noted  above  in  the  discussion  of  the  nested  factorial  design,  the  Phase  I 
design  was  formulated  with  the  goal  of  obtaining  maximum  possible  information  about 
differences  between  the  competing  architectures.  With  the  given  Phase  I design, 
comparisons  among  the  three  architectures  are  theoretically  not  confounded  by  dif- 
ferences among  either  test  programs  or  programmers.  The  eight  test  programs  in- 
cluded in  the  Phase  I design  are  those  of  main  interest  to  the  committee.  The  Phase 

I oesign  was  viewed  as  the  most  important  of  the  three  designs  formulated.  It  fo- 
cused attention  on  direct  comparisons  of  the  architectures,  treated  the  test  pro- 
grams of  primary  interest,  and  called  for  the  largest  number  of  observations  among 
the  three  chosen  designs. 

The  design  termed  Phase  III  was  formulated  according  to  the  same  plan  as  was 
Phase  I,  except  that  four  test  programs  and  four  programmers  were  utilized.  The 
design  layout  is  shown  in  Figure  4-2.  The  Phase  III  design  contains  half  as  many 
observations  as  the  Phase  I design  and  thus  gives  statistical  results  of  less  pre- 
cision. The  test  programs  in  the  Phase  III  design  are  of  lesser  interest  to  the 
committee  than  those  in  Phase  I,  The  four  programmers  in  Phase  III  are  distinct 
from  the  eight  in  Phase  I. 

Together  Phase  I and  Phase  III  designs  provide  a view  of  all  three  machines 
and  the  operation  of  all  twelve  test  programs  selected  by  the  committee.  A third 
experiment,  labelled  Phase  II,  was  also  planned.  This  was  viewed  as  an  auxiliary 
effort,  designed  to  provide  information  additional  to  that  given  by  Phase  I and 
Phase  III.  Phase  II  was  to  be  completed  only  if  it  was  clear  that  the  programmers 
assigned  to  it  would  not  be  needed  to  aid  in  the  completion  of  Phase  I and  Phase 
III,  The  Phase  II  design  called  for  three  programmers  to  write  nine  different  test 
programs,  three  on  each  of  the  three  machines.  The  programmers  assigned  to  Phase 

II  were  able  to  devote  enough  time  to  the  test  program  study  to  permit  use  of  a 
design  which  required  them  to  write  nine  different  programs.  Six  of  the  Phase  I 
programs  and  three  of  the  Phase  III  programs  were  selected  for  Phase  II.  Success- 
ful completion  of  Phase  II  was  viewed  at  the  outset  as  somewhat  of  a bonus.  Some 
comparisons  among  programs  not  possible  in  Phase  I and  Phase  III  could  be  made, 
and  the  statistical  results  of  Phase  II  could  be  compared  to  those  of  the  other 
two  experiments.  The  Phase  II  design  was  formulated  as  a one-third  fraction  of  a 
3^  complete  factorial  design.  The  3.4.3  plan  in  [Connor  and  Zelen,  1969]  was  used. 
This  was  made  possible  by  dividing  the  factor  test  programs,  which  appears  at  nine 
levels,  into  two  pseudofactors,  each  at  three  levels.  The  layout  of  the  Phase  II 
design  is  shown  in  Figure  4-3.  One  of  the  Phase  II  programmers  also  participated 
in  the  Phase  I design.  The  only  duplicate  assignment,  however,  was  test  program 

Cj  on  the  IBM  S/370. 
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ure  4-3.  Phase  II  Design 

b.  ANALYSIS  OF  TEST  PROGRAM  RESULTS* 


This  section  describes  the  expcrifiiental  results  and  statistical  analysis  of  the 
test  program  data.  The  outcomes  of  the  statistical  calculations  are  interpreted  in 
light  of  t''e  goals  and  purposes  of  the  experiment.  We  shall  first  focus  attention 
upon  the  Phase  I experiment.  The  the  Phase  II  experiment  will  be  discussed.  Some 
analysis  combining  data  from  Phase  I and  Phase  III  will  then  be  presented.  Finally, 
the  Phase  II  experiment  will  be  described. 

a.  PHASE  I MODELS 

As  noted  in  Section  4,  the  Phase  1 experiment  consists  of  a pair  of  nested 
factorial  designs.  A possible  model  for  these  nested  factorial  designs  is 

y.=C+P+T  +M+PM  +TM  +e 
1 Jk  i ij  k ik  i jk  i jk  (51) 

i = 1,2, 3, 4,  j = 1,2,  k = 1,2,3. 

In  this  equation  y.  is  some  response  generated  by  the  ith  programmer  writing 
the  jth  test  progrjA'^on  the  kth  machine.  Also, 

C = constant,  termed  the  grand  mean 

P^-  = effect  due  to  the  ith  programmer 

T^j  = effect  of  the  jth  test  program  assigned  to  the  ith  programmer 

M = effect  of  the  kth  machine 

k 

PM.|^  = interaction  between  the  ith  programmer  and  the  kth  machin'^ 

TM.  = interaction  between  the  jth  test  program  written  by  th'^  itt! 

programmer  and  the  kth  machine 

®iik  " ^ random  error  term,  assumed  to  be  normally  distributed 

witn  mean  0 and  variance  not  dependent  on  the  values  of  i,  j, 
and  k. 

Figure  4-1  indicates  that  programmers  1-4  are  in  one  nested  factorial  design  and 
programmers  b-B  are  in  the  other.  Data  values  from  the  Phase  I experiment  were 
analyzed  using  the  analysis  of  variance  (ANOVA),  as  applied  to  the  nested  factorial 
design.  Some  summary  values  resulting  from  the  ANOVA  calculations  are  displayed  and 
giscussed  below. 


j 
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As  mentioned  in  Section  4,  readers  may  skip  ahead  to  Section  6 to  get  a summary 
of  the  principal  test  program  results. 
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rne  Phase  I experiment  may  also  be  inoOelled  in  a manner  different  from  that 
nescrioed  above.  In  Phase  1 there  are  two  factors  at  eight  levels  each,  programmers 
ano  test  programs,  ana  one  factor  at  three  levels,  machines.  The  two  eight-level 
factors  nay  each  be  replaced  by  three  pseudofactors  at  two  levels  each.  Consider 
the  eigiit  programmers,  who  constitute  a single  factor  at  eight  levels.  Now  define 
three  factors  at  two  levels  each.  Label  these  factors  A,  3,  and  C,  and  code  their 
levels  as  o ana  1.  Tlien  the  correspondence  between  the  levels  of  the  three  pseudo- 
factors  and  the  levels  of  the  original  factor  (programmers)  is  shown  in  Figure  5-1. 

We  are  concerned  with  a complete  factorial  experiment  involving  = 192  total 
observations.  The  actual  Phase  I experiment  is  a 1/4  fraction  of  this.  A model 
may  be  rit  using  dummy  variables  to  account  for  various  effects  and  interactions. 

b.  TPA.'lSFORMATIUU  OF  THE  OATA 

liio  discussion  in  this  subsection  is  applicable  to  all  three  designs.  Phase  I, 
r'nase  II,  and  Phase  III.  Values  of  the  S,  M,  and  R measures  for  all  the  experiments 
are  tabulated  in  Appendix  E.  Examination  of  the  data  given  in  Appendix  E clearly 
snows  there  is  wide  variation  in  the  data  from  one  test  program  to  another,  e.g., 
especially  for  the  I!  and  R liieasures.  Various  statistical  considerations  suggest 
that  some  transformation  of  the  raw  data  prior  to  analysis  is  desirable,  A tech- 
nical discussion  of  transformation  of  statistics  is  given  oy  [Rao,  Section  6g,  1973], 
who  illustrates  use  of  the  methodology  in  various  contexts. 

In  the  EFA  study  the  purpose  of  a transformation  of  the  data  is  to  stabilize 
variance,  so  that  a model  of  the  form  (b.l)  will  hold.  Specifically,  the  model  (b.l) 
assumes  that  the  variance  of  the  error  term  e- •,  is  constant,  or  independent  of  i, 
j.  and  t.  Under  this  assumption  inferences  which  follow  from  ANOVA  calculations,  as 
described  below,  are  valid. 

A variance  stabilizing  transformation  is  frequently  sungested  by  consideration 
of  the  experimental  situation  and  prior  understanding  of  the  variation  to  oe  expec- 
ted in  the  data.  For  example,  consider  the  IT  and  R measures.  Suppose  some  pro- 
grammers each  write  two  test  programs  and  the  average  run  time  of  the  second  one  is 
k times  the  run  time  of  the  first.  Then  if  the  standard  deviation  of  the  M or  R 
readinds  is  V for  the  first  test  program,  it  can  be  expected  to  be  proportional  to 
kV  for  toe  second  test  program.  In  other  words,  the  variability  (standard  deviation) 
in  run  tines  is  directly  proportional  to  the  average  run  time.  The  accuracy  of  this 
conjecture  will  be  tested  in  the  analysis  discussed  in  the  next  section,  but  clearly 
there  is  strong  intuitive  support  for  this  assumption.  Consider  the  Runge-Kutta 
test  program.  Its  .i  and  R measures  are  dominated  by  the  computation  of  the  inner 
loop  performing  the  step-wise  solution  of  the  differential  equation.  Variations  in 
M and  R measures  will  be  a result  of  alternative  encodings  of  this  inner  loop.  Aver- 
ade  M and  R measures  will  be  doubled  if  tne  number  of  iterations  requested  is  doubled, 
‘'^reovor,  doubling  the  number  of  iterations  will  also  cause  differences  between  the 
different  Runge-Kutta  programs  to  double.  When  the  standard  deviation  of  the  test 
data  is  directly  proportional  to  the  mean,  a logarithmic  transformation  will  stabilize 
tne  variance,  that  is,  remove  the  dependence  of  the  variance  on  the  size  of  the  test 
program  [Rao,  Sec.  bg,  1973], 

The  model  of  (5.1)  may  be  termed  an  additive  model.  That  is,  each  observation 
is  modelled  as  a sum  of  various  effects  and  interactions,  and  the  statistical  error 
term  is  also  involved  additively. 
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Pseudofactors 


Figure  5-1.  Correspondence  Between  Levels  of  a (liven  Factor 
(Progranners)  and  Levels  of  Pseudofactors  Used 
as  Substitutes. 


Uhen  a logarithmic  transformation  is  used  for  the  data,  y.  in  (6.1)  becomes 
the  logarithm  of  the  response,  such  as  the  M or  the  R reading. ’■^tn  this  case,  a mul- 
tiplicative model  in  fact  underlies  (5.1).  We  may  write 
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The  connection  between  (5.1)  and  (5.2)  is 
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Thus,  use  of  the  logarithmic  transformation  on  both  sides  of  (5.2)  yields  (5.1), 
and  tne  multiplicative  model  (5.2)  may  be  viewed  as  the  meaningful  oasic  underlying 
model.  As  noted  aoove,  this  situation  seems  to  arise  naturally  in  consideration  of 
the  I and  R measures. 


Similar  considerations  for  the  S measure  suggest  a square  root  transformation 
is  appropriate  to  stabilize  its  variance.  This  transformation  arises  because  the 
variance,  rather  tnan  the  standard  deviation,  of  the  S measure  of  the  second  test 
program  in  the  above  discussion  can  be  expected  to  be  proportional  to  kV.  Use  of 
the  sauare  root  transformation  would  imply  use  of  the  model  in  (5.1)  with  y. 
denoting  the  square  root  of  the  measured  S value. 


To  summarize  this  discussion,  we  note  that  the  purpose  of  a variance  stabi- 
lizing transformation  in  the  CFA  study  is  twofold.  First,  a transformation  may 
have  underlying  it  a meaningful  model  for  the  data,  such  as  the  multiplicative 
model  (5.2).  And  second,  stabilization  is  necessary  to  justify  the  use  of  ANOVA 
technidues  to  analyze  the  data.  It  should  be  noted  that  the  square  root  and  loga- 
rithmic transformations  are  only  two  of  a large  number  of  transformations.  A family 
of  practical  transforms  takes  a response  z and  transforms  it  according  to  z'*  for  any 
a > U,  With  an  appropriate  interpretation,  the  logarithmic  transformations  corre- 
spond to  the  limiting  value  a = 0.  This  family  of  power  transformation  is  discussed 
in  detail  by  [Box  and  Cox,  1964].  In  the  present  study  only  the  square  root  and 
logarithmic  transformations  have  been  used.  As  discussed  above,  the  former  trans- 
formation appears  to  be  appropriate  for  the  S measure  and  the  latter  for  the  M and 
R measures.  Results  of  statistical  analysis  for  both  transformations  on  all  three 
measures  will  in  fact  by  presented  below. 
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STATISTICAL  AUALYSIS  OF  FHASE  I DATA 


Analysis  of  variance  calculations  were  performed  on  both  halves  of  the  Phase  1 
expori nent  for  /S,  /M,  /P,  In  S,  In  ii,  and  In  R measures.  Each  analysis  on  a half 
of  the  Pnase  I experiment  involved  24  data  values.  In  each  analysis  the  sample 
variance  of  the  24  values  was  decomposed  into  sums  of  squares  attributable  to  vari- 
ations among  programmers,  test  programs,  machines,  programmer-machine  interactions, 
and  test  program-machine  interactions.  The  proportions  of  the  total  variance  due 
to  the  various  sums  of  squares  are  summarized  in  Table  5-1. 

These  analysis  of  variance  calculations  indicate  that  test  program  and  progran- 
ler  variations  account  for  most  of  the  variation  in  the  data  in  the  case  of  the  M 
and  R measures,  and  that  machine  differences  are  extremely  small  on  a relative  scale, 
iachinc  oifferences  are  noticeaole  for  the  S measure.  More  detailed  and  precise 
comparisons  of  the  three  machines  are  presented  below. 

Table  5-1  shows  that  in  the  case  of  the  S measure  there  are  some  differences 
between  tlie  two  sets  of  programmers  (1-4  vs.  5-8).  Each  half  of  the  Phase  I experi- 
ment is  rather  small,  involving  only  four  programmers  and  24  observations.  The  out- 
comes for  the  S measure  suggest  that  a larger  sample  of  programmers  may  be  necessary 
to  ODtain  results  that  representative  of  the  population  of  professional  programmers. 

At  the  end  of  Section  b.l  an  alternative  formulation  for  the  model  of  the  Phase 
! experiment  was  discussed.  This  formulation  viewed  the  experiment  as  a fractional 
factorial  design  and  replaced  each  of  two  factors  by  a set  of  pseudofactors.  Using 
dummy  variaoles,  we  fit  models  to  sets  of  Phase  I data  with  /S,  /M,  r'R,  in  S.  In  !l, 
and  In  R as  the  responses.  In  each  model  24  parameters  were  fit,  leaving  24  degrees 
of  Treedom  to  measure  experimental  error.  Estimates  of  the  variance  of  the  error 
term  in  the  model  (5.1)  are  shown  for  each  type  of  response  in  Table  5-2.  The  esti- 
mates of  variance  shown  in  Table  5-2  nay  oe  used  to  construct  confidence  intervals 
ror  comparisons  among  the  three  architectures.  As  in  (5.1),  let  M . denote  the 
differential  effect  due  to  the  ktn  machine.  Re  estaolish  the  folllSwing  convention: 

= effect  of  the  PDH-11 

= effect  of  the  lUil  S/J7U 

= effect  of  the  Interdata  ci/32. 

Taole  5-3  shows  estimates  of  various  machine  comparisons  for  the  Phase  I data.  A 
:^5 3 confidence  interval  is  quoted  below  each  estimate.  The  confidence  intervals 
which  do  not  cover  the  value  u correspond  to  comparisons  significant  at  level  .05 
(=l-.9b).  Thus  at  level  .05  the  Interdata  8/32  is  superior  to  the  ISM  S/370  on  all 
measures.  For  the  'U-Hj  comparison  the  PDP-11  is  adjudged  superior  at  level  .05 
on  four  of  tne  measures  and  barely  misses  being  superior  when  and  ''M  responses 
are  considered.  Moreover,  the  IBM  S/370  is  inferior  to  the  average  performance  of 
the  other  two  machines  on  all  measures.  The  estimates  of  the  comparisons  M^-M^  and 

1/2(Mj^+:'13).M2  are  statistically  independent,  and  no  other  pairs  are  independent.  It 
is  v/orth  noting  that  these  comparisons  among  the  competing  architectures  are  based 
upon  consideration  of  test  programs  A through  H only.  It  is  reasonable,  however,  to 
view  the  eight  programmers  in  Phase  I as  representative  of  a large  population  of 
programmers. 
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Measure 


i 

i 


Response 

Estimate  of 

/S 

18.175 

272.546 

M 

577 .863 

In  S 

.398 

In  I'l 

.377 

In  R 

.400 

Table  5-2.  Estimates  of  the  Variance  of  the  Error  Component 
in  Model  (5.1),  Phase  I Data,  24  Degrees  of  Freedom 
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1 

Table  5-4  displays  estimates  of  the  effects  3nd  p for  the  various  measures.  | 

Tne  latter  are  obtained  by  exponentiating  the  estimates  or  M,  and  are  appropriate 
for  the  logarithmic  models  only.  Since  the  effects  noted  in^Table  5-4  are  differential 
values,  a value  of  U is  neutral  for  H.  and  a value  of  1 is  neutral  for  u. • The  figures 
in  Table  5-4  are  consistent  for  the  different  measures  and  transformations.  The  IBM 
5/37U  is  noticeably  worse  than  the  other  two  machines.  For  the  responses  /S,  /M,  and 
.R.  the  Interaata  6/32  appears  to  be  modestly  better  than  the  PDP-11. 

One  may  interpret  the  last  three  lines  of  Table  5-4  in  the  following  way.  The 
IBM  S/37U  requires  142.5%  as  much  space  to  represent  test  programs  A through  H as  the 
average  of  the  three  machines,  while  the  PDP-11  and  the  Interdata  8/32  require  86.2% 
and  61.5^»  as  much  space,  respectively.  Corresponding  statements  may  be  given  for 
execution  times  as  reflected  by  the  In  M and  In  R measures. 


easure 



✓ M 

/R 

In  S 

In  M 

In  R 

Ccnpari son 

of  Machines 

-1. 

i 

-.786 

-1.916 

-4.788 

-.148 

-.230 

-.247 

2.161 

6.387 

16.502 

.354 

.443 

.462 

} 

.j 

-1.374 

-6.466 

-11.714 

-.205 

-.212 

-.235 

* 

.862 

.795 

.781 

1.425 

1.557 

1.619 

.815 

.809 

.791 

,'j  : effects  fur  Hjr-il 
1 1 

effects  for  IBM  S/370 
effects  for  Interdata  6/32 


Table  6-4.  Estimates  of  ilachine  Effects  in  Models  (5.1)  and  (5.2),  Phase  I 


BtSI  iWAlLABlt  COPY  I 

i 

1 


43 


phase  III  mouels  anu  results 

The  models  for  Phase  III  experiments  are  the  same  as  in  (b.l)  and  (s.2),  except 
Chat  the  suDscript  i assumes  the  values  1 and  ^ only.  The  layout  of  the  Phase  III 
aesigns  is  illustrated  in  Figure  A-il, 

Taole  o-S  gives  a summary  of  analysis  of  variance  calculations  for  the  Phase  III 
data.  It  is  comparaole  to  Table  b-1. 

Estimates  of  tne  variance  of  the  error  term  in  the  Phase  III  version  of  model 
(o.l)  are  shown  in  Table  o-b,  which  is  the  analog  of  Table  b-2.  The  estimates  are 
based  on  eight  degrees  of  freedom.  That  is,  models  were  fit  to  sets  of  Phase  III 
data  with  /S,  AA,  /R,  In  S,  In  M,  and  In  R as  the  responses.  In  each  model  i!4  data 
values  were  employed,  and  dummy  vari abl es^were  used  to  fit  lb  parameters.  It  should 
ae  noted  that  the  Phase  III  estimate  of  have  less  precision  than  those  of  Phase  I 
--  they  are  based  on  eight  degrees  of  freedom  rather  than  24.  Tne  corresponding  fig- 
ures in  Tables  b-2  and  b-b  are  comparable,  though,  with  several  of  the  pairs  very 
close. 

Table  b-7  is  the  analog  of  Table  b-J,  and  Table  b-rf  tne  analog  of  Table  b-4. 
done  of  tne  confidence  intervals  shown  in  Table  b-7  fails  to  cover  the  value  u. 
However,  it  is  apparent  that  the  PDP-11  performed  noticeably  worse  than  the  otner 
two  Miacitines  in  Phase  111.  Also,  there  is  very  little  difference  between  the  IBM 
i/j/D  ano  the  Interdata  d/B2  in  Phase  III. 

The  relatively  poor  performance  of  the  PDP-11  in  Phase  III  appears  to  oe  due  to 
Its  inaoility  to  nanule  test  program  1,  quicksort.  Certainly  part  of  tne  expl ana- 
cion  for  the  poor  perfonnance  of  tne  IBM  S/B7U  in  Phase  I can  be  attributed  to  test 
program  A,  I/O  kernel  with  four  priority  levels.  In  the  next  section  results  from 
rnase  I and  Phase  III  are  conimned  to  produce  overall  estimates  of  machine  effects 
and  overall  comparisons  of  the  machines. 

e-  CUMBInATIUM  of  PHASE  I AND  PHASE  III  RESULTS 

Let  9.  denote  an  estimate  of  a machine  effect  or  comparison,  such  as  or 
^-i*l|,  in  Phase  I.  Let  9^,,  denote  the  estimate  of  tne  same  effect  or  comparison 
n Phase  III.  In  the  previous  two  sections  such  estimates  were  given,  as  well  as  some 
confidence  intervals.  The  purpose  of  this  section  is  to  present  estimates  of  the  form 

a9j  + (l-a)9jjj, 

wnere  a is  chosen  to  minimize  the  variance  of  the  resulting  linear  combination  and 
u < -1  < 1 . Taole  b-y  shows  estimates  of  machine  comparisons  and  9b%  confidence 
iicervals.  Tne  value  of  a for  eacn  column  in  the  table  is  given  along  tne  top  bor- 
der. In  all  columns  but  the  fourth  more  weight  is  given  to  the  Phase  I data.  ' ile 
j-io  gives  estimates  of  machine  effects  with  Phase  I and  Phase  III  data  combi. leo 

Five  of  the  confidence  intervals  for  M^-M^  in  Table  b-y  fail  to  cover  the  value 
zero,  and  the  sixth  one  (for  M)  almost  fails  to  cover.  Thus,  the  evidence  suggests 
tnat  tne  Interdata  B/J2  performs  better  than  the  IBM  S/37U  on  all  three  measures,  S, 

M,  and  R.  Also,  the  IBM  S/37U  tends  to  be  worse  than  the  average  of  the  other  two 
machines. 
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Response 

/S 

/M 

/R 

1 n S 
In  II 
In  S 


Estimate  of 
iS.bOb 
24b .bed 
lu7y . 1 


.174 

.374 

.3UB 


Table  b-o.  Estimate  of  of  the  Variance  of  tne 

Error  Component  in  i-lodel  (n.L),  Rbase  III 
Data,  Eight  Degrees  of  Freeoon. 
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Table  b-7.  Estimates  of  Machine  Comparisons  anh 
95*,  Confidence  Intervals,  Phase  111 
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Measure 

/S 

/f1 

/R 

In  S 

In  M 

In  R 

Comparison  of 

Machines 

■'l' 

2.IJ09 

7.959 

17.169 

.133 

.229 

.223 

It, 

-.212 

-5.461 

-9.111 

.042 

-.165 

-.098 

• 1 

-1.797 

-2.498 

-8  .OoO 

-.174 

- .066 

-.125 

1 

1.142 

1.257 

1.250 

/ 

1.083 

.848 

.907 

{ 

.840 

.936 

.882 

■ . ; effects  for  PL)f-ll 

i 1 

j,:  effects  for  IBM  S/37u 
i'j.  -‘j;  effects  for  Interdata 

Taiil,'  j-:,'.  Esti'nates  of  Machine  Effects  in  riodels  (5.1)  ind  ib.Z),  Pnase  III 
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The  estimates  of  y.  in  Table  5-lU  provide  a summary  of  the  Phase  I and  Phase  III 
data.  The  IBM  S/370  requires  120.8%  as  much  storage  as  the  average  of  all  three 
machines  for  the  12  test  programs  studied.  According  to  the  M measure  estimate,  the 
IBM  S/37U  requires  126.6%  as  much  time  to  "execute"  the  test  programs  as  the  average 
of  the  three  machines.  The  other  figures  in  the  lower  part  of  TaPle  5-10  are  inter- 
preted similarly. 


Measure 

/s 

/M 

/R 

In  S 

In  M 

In  R 

Compari son 

of  Machines  a=.67 

a = .64 

a=.79 

1 

1 

1 

II 

Q 

^=.66 

a=.61 

i 

.135 

1.63« 

-.177 

.001 

.07b 

.064 

it 

1.373 

3.402 

11.123 

,18^ 

, 23o 

.256 

■1 

i 

-1.514 

-5.u3y 

-10.947 

-.189 

- . 163 

-.192 

“l 

l.ool 

.928 

.938 

U,. 

/ 

1.208 

1.266 

1.292 

< 

.o28 

.830 

.825 

; -'ffects  for  PDP-11 
effects  for  IBM  S/37o 
effects  for  Interdata  «/32 


I ! ■!  ; -I  tstinates  of  Machine  Effects  in  Models  (5.1)  and  (5.2),  Phase  I and 
'Miase  III  Data  Combined 

F.  enASE  II  MODELS  AND  RESULTS 

Analysis  of  variance  calculations  were  performed  on  oata  arising  from  the 
rnase  II  design.  Some  of  the  results  for  responses  ''S,  In  R,  and  In  M are  sunmar- 
izea  in  Taole  5-11.  This  taole  indicates  the  proportions  of  tne  total  variance 
attributable  to  various  sums  of  squares.  The  corresponding  degrees  of  freedom  are 
also  indicated.  The  statistical  analpis  was  performed  oy  utilizing  the  theory 
associatnj  with  a 1/3  fraction  of  a 3^  design.  The  variance  was  split  into  sums  of 
squares  each  with  t.vo  degrees  of  freedom.  Since  two  of  the  factors  in  the  design 
were  in  fact  pseudofactors  at  tnree  levels  each  to  account  for  the  nine  test  pro- 
grams, several  sets  of  sums  of  squares  were  combined.  There  is  some  aliasing  in 
the  design  involving  second-order  interactions. 


rteasure 


In  M In  R 


/S 

ium  or  squares  I'eqrees  of  treeQom 

f t'Oyrdiiiiiiors 

Test  Programs 

l•lactll  nes 

rrograminers 
A .’lacthnes 

Test  Programs 
A i'>lacnines 

Test  Programs 
A Programni  rs 

Taole  s-11.  Phase  II  AnuVA  Calculations  Proportions  of 
Variance  Attrioutaole  to  Each  Sum  of  Squares 


Estimates  of  differential  effects  in  a model  comparaole  to  (b.l)  for  the  three 
machines  can  also  oe  given.  For  the  /S  measure  they  are  -.ybi!  for  tne  POP-li,  l.bUb 
ror  the  lori  S/d7o,  ano  -.bbJ  for  the  Interdata  b/32.  For  tne  In  M measure  the  values 
are  -.oyl,  .dub,  and  .loj  for  the  machines  quoted  in  the  same  order,  and  the  figures 
are  -.ooi^,  .dJo,  and  AZS  for  the  In  R measure.  Thus,  the  experimental  results  tend 
to  ranx  tne  machines  with  the  PUP-il  first  by  a substantial  margin,  and  the  Interoata 
ranxs  second.  It  should  oe  noted  that  test  program  A was  included  in  the  Phase 
II  design,  and  test  prugrains  U and  I were  not. 

g.  AUALYSIb  UF  P.iASE  I AIJL)  PHASE  III  USiNb  ONLY  THE  "BEST  PROGRAMS" 

Each  test  program  in  tne  Phase  I ano  Phase  III  designs  was  written  by  two  dif- 
ferent progranihiers.  Moreover,  the  models  (b.l)  and  (S.il)  for  these  designs  involved 
tnree  factors,  programmers,  test  programs,  and  machines.  In  this  section  we  des- 
cribe analysis  of  the  data  when  the  factor  representing  tne  programmers  is  eliminated 
from  tne  designs.  This  is  accomplished  by  selecting  the  smaller  of  the  two  S,  M,  or 
R readings  wnen  a specified  test  program  is  written  for  a specified  machine.  The  in- 
terpretation of  tnis  approach  to  the  analysis  is  that  the  "best"  programme  is  being 
selected  in  each  case.  It  is  wortn  noting,  however,  that  "best"  is  merely  among  two 
persons  for  the  data  in  Phase  1 and  Phase  III. 

When  the  designs  are  reduced  to  consideration  of  two  factors  in  the  manner  des- 
crued  aoove,  the  model  (b.l)  is  replaced  by 


yi  . = C + T.  . H.  . TM. . . e^j. 


where 

y^j  = response  wnen  test  program  i is  written  for  machine  j. 


E 

.U<I7 

.uib 

.U26 

B 

.0E3 

.053 

.6bU 

.U7o 

.UbB 

i 

.udy 

.Ub3 

.U47 

6 

.13i! 

.li'R 

.121 

4 

.U47 

.U7b 

.U7d 

51 


BESI  AVAIIABIE  COPY 

T,  - otfect  Of  test  program  i, 

'■  - of  machine  j, 

nJ 

TH_=  interaction  between  test  program  i ana  machine  j, 

e.  . = a random  error  term,  assumed  to  be  noj-mally  distributed 
with  mean  L)  and  constant  variance 

The  design  (b.J)  is  a two-way  layout  (see  [Anderson  and  IlcLean,  1974],  e.g.).  We 
present  below  the  usual  analysis  of  variance  table  for  /S,  In  M,  and  In  R responses 
for  tne  Phase  I and  Phase  III  experiments. 

First  consider  the  FS  response  for  the  Phase  I design.  Table  displays  the 

ANOVA  calculations.  The  sum  of  squares  for  machines  has  been  decomposed  into  two 
parts,  one  involvinn  the  comparison  and  the  other  the  comparison  1/2 

Tne  interaction/error  sum  of  squares  has  been  similarly  decomposed.  Details  of  this 
decomposition  are  presented  by  [Anderson  and  HcLean,  1974],  e.q.  As  above,  the  corre- 
spondence IS  Pi)P-ll,  ibm  S/37U,  Mg:  Interdata  8/32. 


Sum  of 

Degrees  of 

Mean 

Source 

squares 

freedom 

.''iCiiare 

Test  -'rn 

‘rains 

274.387 

1 

39.148 

'acni fif 

- 1 

.JOl 

1 

.31 1 

/ 'r 

- 

37.989 

1 

37. 4-  J 

sii;i  total 

.«.35U 

2 

Inter,!',  r 

1 '’i/errc'r 

nro'^rat’^s 

l'J.9"2'j 

7 

l.b<.  1 

)-  1,' IXproerams 

Ui.l3_4 

7 

I0.U2 

subtotal 

124.iJb9 

14 

Total 

4 lb. 7 ■16 

23 

’"ill-''  -Id.  AtlOVA  for  /S.  Phase  I.  Model  (5.3) 


Estimates  of  tr.o  aifferential  effects  M and  1I3  in  (b.3)  are  -.739,  1.779, 

and  -l.u4u,  respectively. 

Tables  b-13  ana  b-14  present  the  same  calculations  for  the  In  M and  In  R re- 
sponses in  Phase  I.  For  In  M estimates  of  M ,,2,  and  M3  are  -.186,  .401,  and  -.21b, 
respectively.  For  In  R they  are  -.196,  .424,  and  -.228,  respectively. 
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Source 

Sum  of 
squares 

Oegrees  of 
freedom 

Mean 

Square 

Test  Programs 

12b. 9bl 

7 

lb. 426 

Machines 

M1-M3 

.U03 

1 

.003 

-^M^+M3)-M2 

1.932 

1 

1.932 

subtotal 

1.935 

2 

Interaction/error 

(Mj^-H2)Xprograms 

.291 

7 

.042 

L2^M^+M3).M2]Xprograms 

3.7U1 

7 

.529 

subtotal 

3.992 

14 

Total 

134.908 

•23 

TaDle  5-13.  AiJOVA 

for  In  H,  Phase 

I,  Model  (5.3 

Sum  of 

Degrees  of 

Mean 

Source 

squares 

freedom 

Square 

Test  Programs 

136.962 

7 

19.965 

Machines 

.004 

1 

.004 

2.161 

1 

2.161 

subtotal 

2.164 

2 

Interaction/error 

(^“'j^-MslXprograms 

.492 

7 

.070 

[•^Ml+M3)-H2]Xprograms 

4.124 

7 

.589 

subtotal 

4.615 

14 

Total 

146.742 

23 

Table  5-14.  ANOVA  for  In  R,  Phase  I,  Model  (5.3) 
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The  results  of  this  "Dest"  programner  analysis  of  the  Phase  1 data  are  consis- 
tent with  other  results  previously  described.  Variability  in  the  data  may  be  attri- 
buted primarily  to  differences  among  test  programs,  rather  than  differences  among 
machines,  and  this  is  especially  so  for  the  M and  R measures.  Moreover,  differences 
among  the  three  machines  are  largely  described  oy  noting  that  the  performances  of 
the  PDP-11  and  Interdata  b/32  are  comparable,  and  that  the  IBM  S/37U  is  measurably 
worse  than  the  other  two.  It  is  interesting  to  note  that  the  decomposition  of  the 
interaction/error  sum  of  squares  shown  in  Tables  b-12-14  reflects  this  assessment  of 
the  three  machines. 

The  comparisons  described  above  for  the  three  machines  were  obtained  for  an 
experiment  involving  test  programs  A-H.  In  the  Phase  III  experiment  test  programs 
I-L  were  utilized  and  the  discussion  in  Section  5.4  indicated  a different  ranking 
of  the  three  machines  was  obtained.  Similar  results  occur  for  the  "best"  programmer 
analysis.  Table  b-15  shows  analysis  of  variance  calculations  for  the  response  /S. 
Estimates  of  the  effects  M^,  m^,  and  Mg  in  (5.3)  are  1.4U3,  -.162,  and  -1.241,  respec- 
tively. The  decompositions  in  Table  5-15  use  Mo-Mg  and  1/2  (M2+iTg)-Mi,  which  differ 


from  the  comparisons  in  Tables  6- 

12-14,  it  shoul 

Id  be  noteo. 

' Source 

Sun  of 
squares 

Degrees  of 
freedom 

Mean 

Souare 

Test  Programs 

161.399 

3 

53.800 

iTacni  nes 

■'2 -'''3 

2.327 

1 

2.327 

^i^'g-Hlgl-ill 

11.812 

1 

11.8i2 

subtotal 

14. U9 

2 

, Interaction/error 

('■i^-PiglXp'^ograins 

13.114 

3 

4.3/1 

Lyl  M^+i  ig ) -Mg  IXprograms 

77.358 

3 

25.766 

subtotal 

9U.471 

6 

i otal 

266.009 

9 

Table  6-15 

. AdOVA 

for^,/  3.  Phase 

III , Model  (b.: 

r.’i'  lii 'icijssi  ir  tnis  sectuin  of  analysis  of  the  "best"  oi'ogranmer  inoQel 
(j.j)  ;av  oe  accurately  sui Mari zert  by  stating  that  the  results  are  very  similar 
to  tnose  ojtaineO  in  Sections  b.J  and  b.4. 

I h.  the  HgOGSAlWEilS  ’ LOGS 

In  the  oriyinal  memo  to  programmers  on  « April  197b  the  programmers  were 
asked  to  keep  a diary  of  how  their  time  on  the  project  was  spent.  At  the  time 
the  request  was  made,  it  was  expected  that  the  primary  use  of  the  diaries  would 
be  in  ati-emptina  to  understand  anomalies  in  the  statistical  analysis.  Though 
r tne  memo  requested  a resolution  of  hours  or  half-days,  and  asked  that  the  time 

I oe  delineated  into  several  Categories,  only  seven  of  the  programmers  kept  de- 

1 taileu  enough  records  to  provide  any  quantitative  information  about  how  program- 

; ner  ,.1  ie  was  spent.  Even  among  these  seven,  the  tisie  scale  recorded,  and  the 

: separdtion  of  protions  of  the  task,  varied  a great  deal.  An  eightii  programmer 

■ recorqcd  information  on  an  hourly  oasis,  but  failed  to  make  any  distinction  aiuong 

I tne  different  phases  of  the  task, 

Tne  availability  of  facilities  had  a large  effect  ori  tne  debugging  time 
recorded.  The  P3P-11  programs  were  debugged  interactively  with  a synidolic  de- 
bugging package*,  on  a machine  whe>e  a simple  debugging  package  is  part  of  the 
, mi,,rocooe.**  The  ItlM  S/370  orograms  were  debugged  on  a batch  system,  where  the 

deouogim  tools  were  core  dumns  and  the  trace  of  parameters  values  provided  by 
the  driver.  For  the  Iiicerdata  8/31/  programs,  almost  no  deouqgirig  time  was  re- 
corded because  most  of  tne  deooiging  was  done  oy  one  of  the  proorammers  wtiu  .nade 
'■-'•■Icdic  weekend  journeys  to  the  only  available  Interdata  site  (Oceanport,  h.J.). 
lO  recorded  only  the  time  spent  aebuqgiii'i  his  own  programs.  When  a programmer 
reco.  an  hour  deuugginc  on  the  S/370,  it  was  no<.  clear  how  much  of  that  time 
■MS  spent  exaniiting  the  listina  and  how  much  was  spent  waiting  for  batcn  runs. 

Even  ./i  tn  the  seven  nrogra  .Mers  mentioned,  there  was  often  not  enouqli  sep- 
aration of  information.  Sometimes  a grourammer  would  record  "four  hours  spent 
stuoyinp  .all  three  nardware  manuals";  it  was  impossible  to  tell  in  sucn  instances 
hoo  'lucn  ti’oe  .ms  snent  studying  any  one  machine. 

with  these  caveats  in  mind,  we  present  the  following  tables  summarizing  the 
■roara'imer  logs.  The  nui-ioers  in  parentheses  represent  the  number  of  programmers 
on  w'ich  tne  tiines  are  based. 


' Oynamic  depuggina  Tool,  di'iT 

"The  iuital  EquiD'nent  Foeporation  LSI-11 
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Total  Time  Spent,  Per  Machine  Per  Programmer 


FDF-11 

IliM  S/J7U 

Interdata  i 

Study 

id  (b) 

(b) 

Sa  (b) 

Codi ng 

4a  (7) 

bb  (/) 

4b. b (7) 

debug  gi  ny 

4a . b { / ) 

JU  (7) 

17. b (5) 

Total  1 

ime  Spent,  Per  Problem 

Per  Programmer 

Study 

Codi  ng 

Uebuggi ng 

A 

^ (d) 

.b  (1) 

U 

.a  (I) 

du  (J) 

C 

la  (1) 

4 (1) 

J 

L 

IS  (J) 

y.b  (a) 

U 

a.b  (1) 

^ (1) 

J 

a.b  (<!) 

ly.a  (J) 

!<!  (1) 

H 

4 (d) 

U) 

i 

4.b  (1) 

14  ( 1 ) 

') 

■J  ( L ) 

11. b (a) 

a (a) 

\ 

^ (1) 

<^4.‘j  (4) 

ly  (4) 

a.b  (a) 

du  (4) 

a.b  (a) 

i.  cMilUNOLUOY  jK  Ttsr  rPObKAW  ASSlGHi'ltNT  AMU  ANALYSIS 

Ine  statistical  approact\  to  tne  design  ot  tne  test  program  assignments  was 
developed  oy  P.  Snaman  and  S.  H.  Fuller  prior  to  the  tnird  CFA  committee  meeting 
>)-t  1 j-iiu  Ft'uruary  I'ilo.  The  approach  was  explained  at  the  meeting,  a description 
'j?  uie  methodology  was  distributed  LFuller  aod  Shaman,  lu/oj,  and  the  CFA  commit- 
tee agreed  to  use  this  approach  in  the  test  program  assignments.  Volunteer  pro- 
graimners  were  requested  at  the  February  CFA  meetihg  and  assignments  were  sent  out 
to  eight  programmers  in  various  Army  and  Navy  laboratories  on  6 April  197d.  During 
the  February  CFA  meeting,  it  became  clear  that  we  could  not  complete  the  test  pro- 
gram study  without  programmers  in  addition  to  the  eight  Army/Navy  volunteers. 
Therefore,  a proposal  was  submitted  by  Carnegie-Mel Ion  University  (CMU)  to  the  Army 
Kesearcn  office,  Durham,  N.C.,  on  7 April  197d  for  support  of  graduate  students  at 
Cmu  to  write  test  programs.  The  proposal  was  accepted  and  CMU  was  awarded  a grant 
(uAAin;g-/b-b-(J<igy ) to  complete  the  test  program  study.  Nine  additional  program- 
iners  from  Cmu  were  assigned  test  programs  and  by  mid-July,  1970,  all  but  three  of 


56 


» trie  yy  test  programs  in  Phases  I,  II,  ana  III  were  written,  debugged,  and  S,  M, 

and  rt  measurements  completed.  Fortunately,  a few  “auxiliary"  test  programs  in 
addition  to  tne  basic  were  written  and  we  were  able  to  estimate  values  for  the 
three  missing  data  points.  (The  missing  data  points  and  the  estimated  values  used 
by  tne  CFA  committee  are  shown  in  Appendix  E.  When  these  missing  data  points  did 
i become  available  no  significant  changes  in  the  results  were  found.) 

Tne  group  of  graduate  students  at  CMU  proved  crucial  to  the  completion 
of  the  full  set  of  assigned  test  programs.  We  gratefully  acknowledge  their 
participation.  Three  of  these  students,  Leland  Szewcrenko,  George  Mathew,  and 
liarindra  Jain  have  oeen  particularly  helpful  through  their  continued  effort  on 
benalf  of  this  project. 

The  results  of  the  test  program  study  were  input  to  the  life  cycle  cost  models 
on  ZJ  duly  19/b  and  the  entire  CFA  committee  was  given  a summary  of  the  results  at 
the  <;4-zb  August  197b  meeting  [Fuller,  Burr,  and  Shaman,  l97o]. 
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o.  Suininary 

This  volume  of  the  Computer  Family  Architecture  (CFA)  Selection  Con»n1ttep 
Final  Report  has  described  how  the  test  program  phase  of  the  CFA  study  was  de- 
veloped, what  methodologies  have  been  used,  and  what  were  the  results  of  the 
study. 

Section  1 descrioed  the  rationale  leading  up  to  the  selection  of  the  li? 
test  programs  used  in  the  study. 

a.  I/O  Kernel,  Four  Priority  Levels 

ij.  I/O  Kernel,  FIFO  Processing 

. . I/;i  Oevicc  Handler 

<1.  Larne  Fast  Fourier  Transform 

Character  Search 

f . Hi t Test.  Set  Reset 

tta  Intcnration 

1.  Linken  List  Insertion 

1.  Ouicksort 

j.  ASCII  to  Floating  Point  Conversion 

k.  virtual  I'lemory  Space  Exchange 

The  full  sped tication  of  these  test  nrograms  is  given  in  Appendix  A.  The 
test  pronrams  were  meant  to  span  the  range  of  application  most  important  to  UoO. 

Section  3 of  tnis  volume  defined  the  three  measures  of  performance  used  to 
evaluate  the  candidate  computer  architectures  on  each  test  program: 

S:  dumber  of  bytes  used  to  represent  a test  program 

"1:  Humber  of  bytes  transferred  between  primary  memory  and  the  processor 
during  execution  of  the  test  orogram 

umnen  o'"  bytes  transferred  among  ir'.t'’rn.il  roris^ers  of  the  f'roces';e>- 
dwnng  execution  of  the  test  program 

In  Sections  4 and  b statistical  results  of  the  test  program  measurements 
discussed.  Section  4 deals  with  general  and  specific  design  considerations  and 
statistical  details  presented  in  Section  5. 

The  test  program  study  involved  three  different  parts.  These  are  labeled 
Phase  I,  Phase  II,  and  Phase  III.  In  Phase  I eight  programmers  each  wrote  two 
test  programs  on  each  of  the  three  machines.  The  programs  in  Phase  I were  A 
through  H.  In  Phase  III  four  programmers  each  wrote  two  test  programs  on  each 
machine.  The  test  programs  in  Phase  III  were  I through  L.  The  Phase  II  design 
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was  luipleinenteu  to  attempt  to  corroQorate  intoriiiation  oDtainea  from  tne  otner 
two  desiijns.  Tnree  programmers  eacii  wrote  nine  different  test  programs,  three 
on  each  of  tne  iiiachines.  The  test  programs  in  Phase  II  were  A,  b,  E,  F,  ij,  ri, 
j . ^ , a no  L . 


The  principal  results  of  tne  test  program  study  that  were  passed  on  to  the 
lite-cycle  cost  models  (see  Volume  VI)  was  the  composite  performance  of  the 
canoidate  arcni tectures  on  the  set  of  IZ  test  programs  in  Phases  I and  III.  An 
Analysis  of  Variance  (ANUVA)  procedure  was  used  to  determine  the  overall  rela- 


tive performance  of  the  three  candioate 
formance  and  tne  lower  the  score  on  any 
t:ar!alea  tne  set  of  test  programs. 


Architecture  S 


PdP-ll  l.uu 
IBH  S/J/u  1.<I1 
Interoata  B/J2  u.o3 


machines.  Unity  indicates  average  per- 
of  the  measures,  the  better  the  machine 


M R 

U.yi  U.V4 
1,27  1.2y 
U.ob  o.dJ 


Taole  b.l.  Relative  Performance  of  Candidate  Architectures 


In  other  words,  our  test  program  results  indicate  that  the  IbM  S/37u  needs  46^ 
more  memory  than  tne  Interdata  d/32  to  represent  the  set  of  test  programs  (or 
more  tnan  tne  average  of  the  tnree  architectures)  and  the  PDP-11  is  essen- 
tially average  in  its  use  of  I'.iemory.  Similarly,  the  POP-ll's  anility  to  "execute" 
tne  test  prograins  ranges  oetween  y3o  and  yyj.  of  tne  average  execution  time  oaseo 
on  tne  im  ana  R measure,  respectively, 

Consiaering  tne  test  program  results  in  a little  more  detail, in  Phase  I the 
data  revealed  the  IBm  o/o/u  to  oe  significantly  worse  tnan  tne  other  two  machines 
on  d,  I'l.  ana  R measures  at  a confidence  level  of  ysf).  Moreover,  the  overall  per- 
formano.  of  tne  PDP-ii  was  virtually  identical  to  that  of  the  Interdata  o/32. 
iOi.ie  part  of  tne  poor  performance  of  the  IBM  S/37u  can  be  traced  to  test  program 
H I tne  priority  I/U  Nernel). 

In  Phase  III  alone,  none  of  the  comparisons  among  the  three  machines  was 
significant  at  a confidence  level  of  y5*>.  The  PDP-11  was  noticeably  the  worst 
Of  tne  three  machines  on  all  three  measures.  The  IBM  37u  dominated  tne  Interdata 
d/32  with  regard  to  tne  ii  measure,  the  Interoata  was  better  for  the  S measure, 
ana  there  was  little  difference  oetween  the  two  for  the  R measure.  The  rela- 
tively poor  performance  of  the  PDP-11  appeared  to  oe  due  to  test  program  I,  a 
guiCKSort  program  working  with  a list  much  larger  than  the  virtual  address  space 
)f  tne  pjp-ll  (64K  bytes). 

jtitisticai  results  from  Phases  I ana  III  were  combined.  In  tnis  analysis 
lie  rariKing  of  the  three  machines  from  best  to  worst  on  all  three  measures  was: 
interoata  o/diY,  PuP-li,  and  IBM  37d.  Tne  PDP-11  was  much  closer  to  the  Interoata 
d/o2  than  tne  loM  j7o  was  to  the  PDP-ll.  The  average  performance  for  the  three 
macnines  in  Phases  I ano  III  is  given  above  in  Table  b.l.  Figure  b.l  shows  the 
abo  confidence  intervals  tnat  surround  comparisons  of  these  average  values. 

The  outcome  of  Phase  11  largely  corroborates  the  results  of  the  other  two 
experiments.  Tne  ranking  of  three  machines,  from  oest  to  worst  is:  PDP-11, 
Interoata  d/32,  IdM  37d.  This  ranking  prevails  for  all  three  measures,  S,  M, 
ana  R.  For  the  il  ana  R .neasures  the  PUP-il  is  substantially  oetter  than  tne 
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I'lterUdtd  For  the  S medsure,  however,  the  PDP-11  ana  Interdata  d/Jii  are 

very  dose  to  each  other  in  performance.  It  is  important  to  recall  that  Phase 
II  includes  test  program  A,  for  wnicn  the  IBM  370  performs  relatively  poorly, 

, doa  dues  not  incluae  test  programs  D and  I,  which  are  relatively  difficult  to 

liiplement  on  the  PoP-li  oecause  they  have  very  large  data  structures. 

The  experimental  designs  executed  in  the  test  program  phase  of  the  computer  ^ 

' family  architecture  evaluation  process  permitted  a quantitative  comparison  of 

the  three  machines  in  the  study.  The  experiments  were  not  large-scale,  and  the 
I.  results  need  to  oe  interpreted  with  some  caution.  In  particular,  only  a very 

[ small  sample  of  programmers  was  used  in  each  phase,  especially  in  II  and  III, 

I Future  studies  of  the  same  size  and  scope  as  the  present  one  seem  from  the  avail- 

aole  evidence  to  oe  worthwhile  undertakings.  They  could  be  used  to  attempt  to 
corrooorate  the  present  results.  Larger  scale  experiments  would  be  much  more  in- 
turiiidcive.  ITnougn  tnese  are  likely  to  require  substantially  more  funds,  however.) 

Ine  main  statistical  issues  appear  to  oe  the  need  to  use  a representative  sample  of 
E programmers  and  to  obtain  estimates  of  parameters  with  high  precision. 

[ Another  important  point  emerging  from  this  study  is  that  there  is  a signifi- 

! cant  interaction  between  the  architecture  under  consideration  and  the  test  program 

being  written.  Tnis  in  fact  was  Known  a priori  to  be  an  important  factor.  Its 
influence  is  clear  in  the  contrast  of  Phase  I,  Phase  II,  and  Phase  III  results. 
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Appendix  A:  Test  Program  Specifications 


A.  I/O  INTERRUPT  KERNEL,  FOUR  PRIORITY  LEVELS 


A.l  INPUTS 

An  asynchronous  I/O  interrupt  with  associated  status  registers. 


A 2 PROCESSING 

The  interrupt  kernel  will  be  activated  by  an  I/O  interrupt  with  priority  level  0,  1,  2,  or  3 
from  one  of  four  devices.  Actual  interrupt  processing  will  be  simulated  by  counting  the 
occurrences  of  each  type  of  interrupt.  Higher  level  interrupts  will  be  able  to  preempt 
processing  of  lower  priority  interrupts.  The  interrupt  handler  must  provide  for  resumption 
of  processing  of  the  preempted  lower  level  interrupt  from  the  point  of  preemption.  As  much 
processing  as  possible  will  be  done  with  higher  priority  I/O  interrupts  enabled. 

Figure  1 represents  the  functions  to  be  performed.  It  is  recognized  that  various 
architectures  may  provide  automatic  (hardware  or  firmware  sypport  for)  features  to  carry 
out  some  of  these  functions.  Appropriate  use  of  such  features  is  allowed.  For  example, 
when  an  interrupt  occurs  on  the  PDP-11,  diasable  can  be  achieved  automatically  be  setting 
up  ahead  of  time  the  appropriate  processor  priority  level  in  the  PSW  of  the  device  interrupt 
vector. 


A. 3 OUTPUTS 

An  updated  count  of  interrupts  by  device. 


A. 4 CONSTRAINTS 

This  program  need  not  be  either  position  independent  or  reentrant. 


The  source  files  are  available  as  specs.pub(c410gin20)  at  CMU-A. 
If  not  send  mail  to  George  Mathew  at  CMU-A 
December  10,  1976  DRAFT 
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Figure  1;  Priority  I/O  Kernel 


B.  I/O  INTERRUPT  KERNEL,  FIFO  PROCESSING 


B.l  INPUTS 

An  asynchronous  I/O  interrupt  with  associated  status  registers. 


B 2 PROCESSING 

The  interrupt  kernel  will  be  activated  by  an  I/O  interrupt  from  one  of  four  devices  which 
will  be  placed  in  a service  queue  for  first-in-first-out  (FIFO)  processing.  Actual  interrupt 
proces-ing  will  be  simulated  by  counting  the  occurrences  of  each  type  of  interrupt.  Space 
should  be  provided  to  handle  at  least  ten  queued  interrupts  at  one  time. 

Processing  of  queued  interrupts  shall  be  done  with  I/O  interrupts  enabled  so  that 
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interrupts  will  be  taken  and  queued  appropriately  while  previous  interrupts  are  being 
processed.  Before  returning  to  the  originally  interrupted  application  program,  a check  shall 
be  made  to  see  if  any  interrupts  remain  queued,  and  these  will  be  processed  in  FIFO  order. 

Figure  2a  represents  the  functions  to  be  performed.  Appropriate  use  of  an  architectural 
feature  to  automatically  provide  some  such  function  is  allowed  (see  POP-1 1 example  in  I/O 
interrupt  kernel  #1). 


B.3  OUTPUTS 

An  updated  count  of  interrupts  by  device. 


B.4  CONSTRAINTS 

This  program  need  not  be  either  position  independent  nor  reentrant. 


B.5  DISCUSSION 

Consider  the  main  processing  loop,  from  "select  next  request  (FIFO)"  to  the  "Is  queue 
empty?"  test  and  back  via  the  "NO"  branch.  During  the  processing  from  "Set  run  flag  off"  to 
"remove  request  from  queu'’"  through  the  NO  branch  of  the  text  and  back  to  "select  next 
request"  and  "set  run  flag  off".  During  this  entire  sequence  interrupts  are  disabled;  it  is 
impossible  for  the  "Is  run  flag  on?"  test  to  be  executed  at  this  point.  The  manipulation  of  the 
run  flag  can  thus  be  safely  moved  out  of  the  loop,  producing  figure  2b.  This  change  should 
principally  affect  the  M and  R measures,  although  it  could  potentially  lead  to  a smaller 
program.  This  change  was  not  made,  because  it  was  felt  that  the  measures  for  all  three 
machines  would  be  affected  equally. 
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I/O  Interrupt 
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Figure  2a:  Original  FIFO  Kernel 
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I/O  Interrupt 


Disable  Interrupts 


Save  Context 


Restore  Context 


I 


EncJble  Interrupts 


I 


Return 


Identify  Device 

'I 


Queue 

Service  Requests 

Run  Flag^^\ 
On? 

Set  run  flag  on 

Select  next 
request  (FIFO) 

Enable  Interrupts 

; 

Simulate 

Processing 

i 

Disable  Interrupts 

i 

Remove  request 
from  queue 

Figure  2b:  Modified  FIFO  Kernel 
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C.  INPUT/OUTPUT  DEVICE  HANDLER 


C.l  INPUTS 

A pointer  to  a control  block  with  the  following  fields: 

DEVICE  The  device  identifier  for  a "typical"  tape  drive. 

0 The  operation  to  be  performed: 

0 Read 

1 Write 

NO-TRANS  The  number  of  1/0  transfers  to  be  performed.  NO-TRANS  < 10 

BUITER  An  array  of  ten  pointers  to  buffers  which  contain  or  will  contain  the  data  to 

be  read  or  written. 

-ENGTH  An  array  containing  ten  values  giving  the  sizes  in  characters  of  the  records  to 

be  transferred  to  the  buffers  specified  in  BUFFER. 

STATUS  Storage  used  to  hold  the  device  completion  status  after  an  I/O  transfer. 

WORK  Storage  available  for  the  device  handler  to  set  up  whatever  is  necessary  for 

the  transfer. 


C.2  PROCESSING 

This  test  program  shall  operate  in  the  following  general  environment: 

r.  Applications  programs  perform  high  level  logical  I/O  calls  that  cause  queuing  of  the 
control  block  described  by  the  DEVICE,  OP,  NO-TRANS,  BUFFER,  and  LENGTH  parameters. 
The  calling  application  program  may  request  that  up  to  ten  records,  in  arbitrary  and  not 
necessarily  contiguous  memory  locations,  and  which  may  have  different  record  lengths, 
be  either  read  or  written  (but  not  both  in  one  request)  to  or  from  memory. 

b.  A separate  test  program  exists  for  the  general  task  of  interrupt  queuing  and  handling 
(whereas  this  benchmark  is  explicitly  to  test  device  I/O  operation). 

c.  This  program  has  two  sections,  one  of  which  is  entered  when  the  initial  setup  request  is 
issued  by  an  application  program  (via  an  SVC  type  operation),  and  a second  section  which 
IS  entered  after  each  I/O  interrupt  attributed  to  the  selected  tape  device.  The  SVC 
handler  and  interrupt  queuing  are  separate  routing  and  not  a part  of  this  program. 
Assume  that  they  are  taken  care  of. 

d.  The  1/0  devices  handled  by  this  program  are  all  single  density  nine-track  tape  transports 
(pick  any  typical  model  tape  transport  for  your  machine). 

After  an  I/O  request  is  issued  by  an  application  program,  and  after  the  executive  queues 
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the  input  control  block,  this  test  program  is  initiated  and  and  it  performs  the  following 

actions: 

a.  Check  the  status  of  the  tape  drive.  If  the  device  or  its  channel  is  busy,  exit.  If  the 
device  is  not  operable,  branch  to  a dummy  error  routine.  If  the  device  is  available,  set 
up  and  initiate  the  requested  transfer.  For  example,  in  the  /370,  fill  in  the  fields  of  a 
CCW  (or  CCWs>,  set  the  CAW,  and  issue  a SIO  to  the  channel:  in  the  PDP-11,  fill  in  the 
device  registers  for  the  particular  device  and  set  the  GO  bit.  Up  to  ten  records  may  be 
written  or  read  as  a result  of  an  I/O  request. 

b.  After  completion  of  the  transfer,  and  a consequent  interrupt,  the  device  handler  is 
reentered  (via  an  interrupt  vector,  first  level  interrupt  handler,  etc.).  The  following 
processing  is  performed: 

i.  The  status  information  is  stored  into  the  STATUS  entry  of  the  control  block.  For 
example,  in  the  /370,  check  the  CSW,  issue  a read-status  SIO,  and  store  appropriate 
information  in  STATUS;  in  the  PDP-11,  identify  the  device,  read  its  status  registers, 
and  store  the  information  in  STATUS. 

ii.  If  the  device  status  indicates  an  unsuccessful  transfer,  abort  any  further  processing 
and  exit. 

iii.  If  the  device  status  indicates  a successful  transfer,  and  if  all  requested  transfers  have 
been  accomplished,  then  exit.  Otherwise  initiate  the  next  requested  transfer  and  then 
exit. 


C.3  OUTPUTS 

The  completion  status  of  the  I/O  transfer  is  returned  via  the  STATUS  entry  of  the  control 
block. 


C.4  CONSTRAINTS 

The  device  handler  is  reentrant  and  self-relocating. 


D.  FAST  FOURIER  TRANSFORM 


D.l  INPUTS 
N 


w 


The  number  of  data  points.  This  is  required  to  be  an  integral  power  of  two  in 
the  range  0 < N ^ 2**  16. 

A vector  holding  the  N samples  as  complex  numbers.  Each  complex  number 
has  a real  part  and  an  imaginary  part,  both  of  which  are  32-bit  floating  point 
numbers. 

A vector  holding  the  first  N/2  powers  of  EXP(-2ni/N),  where  i**2  - -1.  That 
is,  W(j>  - EXP<-2ni/N)**j. 


A-7 


Test  Program  Specifications 


WORK  Pointer  to  auxiliary  working  storage. 


D.2  PROCESSING 


procedure  FFT (N,  X,  U) 

GROUPS  ^ N 

do  for  PASS  ♦-  0 by  steps  of  1 until  log2(N)-l 

do  for  all  ELEtlENT  such  that  0 < element  i N/2 
"generate  complex  addend" 

UEXP  - 0 
if  PASS  > 0 

then  UEXP  (((ELEMENT  * N)/2)/  2>WrPASS)  MOD  (N/2) 
end- i f 

if  UEXP  X 0 

then  TEMPI  »-  X (ELEMENT  + N/2)  * U(UEXP) 
else  TEMPI  ^ X (ELEMENT  + N/2) 
end- i f 

"generate  2 element  entries  in  data  vector" 

XI  (ELEMENT)  X (ELEMENT)  + TEMPI 
XI (ELEMENT  + N/2)  ^ X (ELEMENT)  - TEMPI 
end-do 

if  PASS  < (log2(N)  - 1) 
then 

"execute  perfect  card  shuffle  on  data  vector" 

P *-  2**PASS 
GROUPS  - GROUPS /2 

do  for  all  I such  that  0 < I < GROUPS 
do  for  all  J such  that  0 < J < P 
INDEXl  - 2>-rP*l  + J 
INDEX2  ^ P.vl  + J 
X( INDEXl)  - X1(INDEX2) 

X( INDEXl 4P)  ^ X1(INDEX2  + N/2) 
end-do 
end-do 

e I se 

do  for  all  I such  that  0 5 I < N 
X^  XI  [I] 
end-do 

end- i f 
end-do 
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0.3  PROGRAMMING  CONSIDERATIONS 

XI  is  art  auxiliary  array  of  size  N used  to  accomodate  the  card  shuffle.  Remember  that 
both  X and  W are  arrays  of  complex  numbers. 


D.4  OUTPUTS 


1 

t 


f 


I" 

I 

t 


A vector  of  FFT  coefficients  in  X.  j 

? 

i 

D.5  DISCUSSION  j 

The  original  version  of  this  specification  included  background  on  the  history  of  the  FFT,  ! 

examples  of  the  execution  for  small  values  of  N,  a short  analysis  of  the  algorithm,  and  some  | 

notes  on  performing  complex  arithmetic  . ] 


E.  CHARACTER  SEARCH 


E.l  INPUTS 

SRCHSTR 

SRCHLNGTH 

SRCHARG 

ARGLNGTH 

LOG 

WORK 


Pointer  to  a string  of  characters  to  be  searched. 
The  length  of  that  string.  < 2»»15 
A pointer  to  a string  of  characters. 

The  length  of  that  string.  S 2*»10 
An  integer  return  code 
A ponter  to  any  needed  working  storage 


E.2  PROCESSING 

SRCHSTR  shall  be  searched  to  see  if  it  contains  a substring  which  exactly  matches 
SRCHARG.  If  the  search  is  successful  the  relative  character  position  of  the  first  occurrence 
of  the  substring  shall  be  returned.  If  no  match  is  found  a negative  value  shall  be  returned. 


E.3  OUTPUTS 

If  the  search  is  successful,  LOC  is  set  to  the  relative  character  postion  of  the  first 
occurrence  of  SRCHARG  in  SRCHSTR.  Otherwise  a negative  value  is  stored  in  LOC. 
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E.4  CONSTRAINTS 

The  routine  ch<ill  be  reentrant  and  position  inoependent. 

procedure  CHARSRCH (SRCHSTR,  SRCHLNGTH,  SRCHARG.  ARGLNGTH,  L'lC,  UORIO 
integer  I 

t 

LOG  ^ -1 

do  for  all  I such  that  0<I <SRCHLNGTH-SRCHARG  or  until 

if  the  substring  of  SRCHSTR  from  I to  I+ARGLNGTH-1  ■ SRCHARG 
then  LOC  *■  I 
end- i f 
end-do 


F.  BIT  TEST,  SET,  OR  RESET 


F.l  INPUTS 
F 


N 

A1 

RC 

WORK 


Function  code 

1 test  bit 

2 set  bit 

3 reset  bit 

Relative  bit  number  to  be  tested,  0 < N S 1000 

Pointer  to  a tightly  packed  bit  string  on  an  even  word  boundary 

Return  code  which  is  set  to  indicate  the  original  status  of  the  bit  (0  or  1). 

Pointer  to  any  needed  working  storage. 


F.2  PROCESSING 

Bit  number  (N  mod  (word  length))  of  word  (A1  + N/(word  length))  is  tested.  If  it  is  zero, 
thi'n  PC  set  to  zero;  otherwise  RC  is  set  to  one.  If  F is  2,  the  bit  is  then  set  to  1.  If  F is 
3,  the  bil  is  then  set  to  0.  For  all  othe  values  of  F,  the  bit  is  unchanged.  The  bit  string  is 
assumed  to  begin  at  the  first  bit  of  location  Al. 


F.3  CONSTRAINTS 


The  routine  shall  be  reentrant  and  position  independent. 
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procedure  BITTEST(F,  N,  Al,  RC,  UORK) 
integer  ABIT,  D 

ABIT  «-  Al  + N/(uord  length) 

0 «-  N mod  (Mord  length) 

if  Dth  bit  at  address  ABIT  ■ 1 
then  RC  «-  1 
else  RC  «-  0 
end- i f 

if  F = 2 

then  0th  bit  at  address  ABIT  ♦-  1 
else  if  F = 3 

then  Dth  bit  at  address  ABIT  «-  0 
end- i f 

end- i f 


G.  RUNGE-KUTTA  INTEGRATION 


G.l  INPUTS 
TO 
YO 
H 

TMAX 

YMAX 

WORK 


Initial  value  of  T,  single  precision  floating  point 
Initial  value  of  Y,  single  precision  floating  point 
Interval  of  integration,  single  precision  floating  point 
Final  value  of  T,  single  precision  floating  point 
final  value  of  Y returned,  single  precision  floating  point 
pointer  to  any  needed  working  storage 


G.2  PROCESSING 

Given  the  differential  equation 

F(t,y)  « t+y  - dy/dt 


and  the  initial  conditions  TO  and  YO,  use  a third  order  Runge-Kutta  integration  from  T - TO  to 
T=TMAX  to  determine  YMAX,  using  an  integration  interval  H.  All  calculations  are  single 
precision  floating  point. 
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G.S  OUTPUTS 

The  value  of  Y at  time  TMAX  is  returned  in  YMAX. 


G.4  CONSTRAINTS 

The  routine  is  position  independent  and  reentrant. 


procedure  RUNGEXUTTA (T0,  Y0.  H,  TflAX,  YHAX.  UORK) 
real  K1 . K2.  K3 

YNAX  Y0 

do  for  all  T from  T0  incremented  in  steps  of  H until  T > TMAX 
K1  ^ H * (T  + YMAX) 

K2  ^ H * (T  + H/2  + Y + Kl/2) 

K3  H * (T  + 3*H/4  + Y + 3*K2/4) 

YMAX  ^ YMAX  + 2*Kl/3  + K2/3  + 4*K3/3 
end-do 


H,  LINKED  LIST  INSERTION 


H.l  INPUTS 

LISTCB  Pointer  to  a list  control  block,  containing  the  entries 

HEAD  Pointer  to  first  node 

TAIL  Pointer  to  last  node 

NUMENTRIES  Number  of  entries  in  list 


NEWENTRY  Pointer  to  a new  entry  to  be  inserted. 


H.2  PROCESSING 


List  entries  have  the  form 
KEY 

NEXT  pointer  to  next  entry 

PREV  pointer  to  previous  entry 


All  fields  are  one  "word"  (16  or  32  bits,  depending  on  the  machine)  long.  The  KEY  is 
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signed  32-bit  integer.  The  first  node  in  the  list  is  marked  by  a PREVPTR  of  zero,  and  the  last 
is  marked  by  a NEXTPTR  of  zero.  NEWENTRY  is  inserted  in  order  in  the  list.  If  there  are 
duplicate  keys  the  NEWENTRY  is  inserted  after  all  matching  entries.  The  list  may  be  empty 
when  the  routine  is  called;  this  is  indicated  by  a zero  value  in  NUMENTRIES. 


H.3  OUTPUTS 

The  new  entry  is  updated  to  point  forward  and  backwards.  Both  previous  and  next 
entries  are  updated  to  point  to  the  new  entry.  NUMENTRIES  is  updated.  If  the  new  entry  is 
a head  or  a tail  then  the  pointers  in  the  list  control  block  are  updated  accordingly. 


H.4  CONSTRAINTS 

The  routine  is  reentrant  and  position-independent. 


procedure  LISTINSERT (LISTCB,  NEWENTRY) 

"the  notation  POINTER. FIELD  is  used  to  access  a 

particular  field  of  the  st’*ucture  pointed  to  by  POINTER" 


pointer  PRESENT 

if  LISTCB.NUflENTRlES  = 0 
then 

"list  is  empty,  so  initialize" 

LISTCB. HEAD  - LISTCB.  TAIL  NEWENTRY 

LISTCB. NUMENTRIES  *-  1 

NEWENTRY. NEXT  - NEWENTRY. PREV  ^ 0 


e I se 

"list  not  empty" 

PRESENT  ^ LISTCB. HEAD 

LISTCB. NUMENTRIES  LISTCB. NUMENTRIES  + 1 


"determine  position  of  new  entry" 

while  NEW. KEY  > PRESENT. KEY  and  PRESENT. NEXT  ^ 0 do 
PRESENT  ^ PRESENT. NEXT 

if  PRESENT. PREV  - 0 and  NEW. KEY  < PRESENT. KEY 
then 
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"new  list  head" 

LISTCB.HEAD  NEU 
NEU.PREV  ^ 0 
PRESENT. PREV  ^ NEU 
NEU. NEXT  ^ PRESENT 

e I se 

if  NEU. KEY  > PRESENT. KEY 
then 

"new  list  tail" 

PRESENT. NEXT  <-  LISTCB.TAIL  ^ NEU 
NEU. NEXT  4-  0 
NEU.PREV  4-  PRESENT 

e I se 

" insert  in  middle" 

NEU. NEXT  4-  PRESENT 
NEU.PREV  4-  PRESENT. PREV 
PRESENT. PREV  4-  NEU 

"back,  up  and  link  with  predecessor 

PRESENT  4-  NEU.PREV 
PRESENT.  NEXT  4-  NEU 

end-  i f 

end- i f 

end-  i f 


I.  QUICKSORT 


I.l  INPUTS 

N The  number  of  records  to  be  sorted.  N < 10000 

PEC  Pointer  to  the  first  element  of  an  array  of  N+2  16-character  records;  RO,  Rl, 

RN,  RN+),  RO  is  a dummy  record  with  a low -valued  key,  and  RN+1  s a 
dummy  record  with  a high-valued  key.  Each  record  has  a 7-character  key 
found  in  positions  3 through  9,  counting  from  0. 

M integer  parameter  specifying  the  changeover  point  between  QUICKSORT  and  a 

simple  insertion  sort.  5 < M S 20 

WORK  pointer  to  any  needed  working  storage. 
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1.2  PROCESSING 

Records  RO  to  RN+1  are  sorted  into  character  collating  sequence. 


1.3  CONSTRAINTS 

The  routine  is  reentrant  and  position-independent. 


procedure  QUICKSORT (N.  REC,  H.  UORK) 
integer  L,  R,  1,  J,  K 
integr  array  STACK  [0:2*f (N)-l] 
character  string  V 


RECIN+l]  - » 

L 1;  R ^ N 
do  forever 

I <-  L 5 J «- 
do  forever 
do  I«-I+l 
do 

i f J > I 
then 
e I se 
end- i f 
end-do 
end-f i rst; 


R+1;  V ^ RECIL] 

until  REC  [I]  > V end-do 
until  RECIJ]  i V end-do 

swap  REC  in  with  RECiJ] 
go  to  end-first 


swap  RECIL]  with  RECIJ] 
if  both  subfile  sizes  (J-L  and  R-J)  S f1 
then 


i f stack  i s empty 

then  go  to  end-outer 
else  pop  L and  R from  stack 
end- i f 

e I se 

if  smaller  subfile  size  < N 

then  set  L and  R to  lower  and  upper  limits 
of  I arger  subf i I e 

e I se 


push  lower  and  upper  limits  of  larger 
subf i le  onto  stack 

set  L and  R to  limits  of  smaller  subfile 

end-i f 

end-  i f 
end-do 


A-15 


Test  Program  Specifications 


end-outer : 
do  for 
I f 


end- last: 

end- i f 
end-do 


I from  N-1  to  1 in  steps 
REC[I]  > REC[I+1]  then 
V ^ RECLN;  J <-  1+1 
do  forever 

RECU-ll  «-  RECU] ; 
if  RECU]  > V then 
end-do 
AU-ll  - V 


of  1 


J 

go  to 


J+1 

end- 


I ast 


end-i  f 


I. 4  NOTES 

F(N)  is  the  maximum  stack  depth,  and  turns  out  to  always  be  less  than  the  natural 
logarithm  of  (N+l)/(M+2). 

J.  ASCII  TO  FLOATING  POINT  CONVERSION 


J.l  INPUTS 

N Number  of  characters  in  the  string 

Al  Address  of  the  character  string;  it  may  be  assumed  to  be  aligned  on  a word 

boundary. 

A2  Address  of  a floating  point  number  where  the  result  will  be  placed.  It  may  be 

assumed  to  be  aligned  on  whatever  boundary  is  appropriate  for  floating  point 
numbers. 


J.2  PROCESSING 
Input  is  of  the  form 

<optional  sign>  <decimal  digits>  . <decimal  digits> 

where  either  set  of  decimal  digits  may  be  omitted.  You  may  assume  that  the  input  is 
correctly  formatted  and  that  there  are  no  extra  characters  beyond  the  end  of  the  number. 
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J.3  CONSTRAINTS 

This  routine  is  reentrant  and  position-independent. 

procdure  AFP(N,  A1 , A2) 

integer  NUMBER.  POSITION 
real  RESULT.  DIVISOR 
boolean  ISNEGATIVE 

ISNEGATIVE  false 
POSITION  ^ 0 

if  first  character  of  A1  is  a sign  character 
then 

if  sign  character  is 

then  ISNEGATIVE  ♦-  true 
end- i f 

POSITION  - 1 

end- i f 

NUMBER  <-  integer  equivalent  of  characters  POSITION  to  J-1  of 
where  character  J of  A1  is 
RESULT  floating  point  equivalent  of  NUMBER 

"the  following  two  steps  can  be  done  in  parallel,  if  desired" 

NUMBER  «-  integer  equivalent  of  characters  J+1  to  N of  A1 
DIVISOR  <-  floating  equivalent  of  10vnv(N-J) 

A2  «-  RESULT  + (floating  equivalent  of  NUMBER)  / DIVISOR 


K.  BOOLEAN  MATRIX  TRANSPOSE 


K.l  INPUTS 

A1  Pointer  to  a word  of  storage 

A2  bit  number  within  word  A1  where  the  matrix  begins 

N size  of  the  boolean  matrix 
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K.2  PROCESSING  i 

Transpose  a tightly-packed  bit  matrix.  All  the  bits  of  the  first  row  are  contiguous,  and  are 
followed  immediately  by  the  bits  of  the  second  row. 

procedure  BUT (N,  A1 , A2) 
integer  I,  J 

boolean  B[1:N,1:N]  beginning  at  bit  A2  of  word  A1 

do  for  all  I and  J such  that  (1  < J < N)  and  (J+1  < I < N) 
swap  BII.J]  and  BIJ,  H 
end-do 


L.  VIRTUAL  MEMORY  SPACE  EXCHANGE 


1 PROCESSING 

Write  a miniature  supervisor  call  handler  which  provides  the  two  functions  "call'  and 
"return".  CALL  is  function  0,  RETURN  is  function  1 (i.e.  SVC  0,  SVC  1 on  the  /370,  TRAP  0, 
TRAP  1 on  the  PDP-11).  In  the  following,  "segment"  means  whatever  IBM  means  by  a 
segment  on  the  /370,  whatever  INTERDATA  means  by  the  term  on  the  8/32,  and  one  of  8 
eight-kilobyte  pages  on  the  PDP-11.  Parameters  will  be  passed  according  to  whatever 
ailing  convention  is  used  for  ordinary  subroutines  on  the  machine  in  question. 

CALL  takes  two  parameters.  CALLEE  is  an  integer  in  the  range  0 < CALLEE  < 255.  It  is 
he  index  of  an  entry  in  a table  of  address  spaces  maintained  within  the  supervisor.  (An 
"address  space"  is  a set  of  segments).  PARAMETER  is  an  <?ddress  in  user  space.  CALL  saves 
enough  information  to  restore  the  entire  state  of  the  caller  (e.g.,  all  the  fixed  point  and 
floating  point  registers,  the  program  counter,  and  so  on).  It  then  establishes  addressibility 
for  the  address  "space  indicated  by  CALLEE.  One  of  the  segment  pointers  in  the  address 
space  will  be  marked  in  some  way  to  indicate  that  it  is  null.  It  is  replaced  by  a pointer  to  the 
segment  containing  the  word  addressed  by  PARAMETER.  It  then  begins  executing  the 
address  space  at  some  arbitrary  point. 

RETURN  takes  no  parameters.  It  restores  the  environment  active  before  the  previous  call. 

^ Calls  may  be  nested  up  to  eight  deep;  you  need  not  check  for  this  bound  being  exceeded. 

I You  need  not  check  for  a RETURN  with  no  matching  CALL.  You  need  not  build  the  address 

1 space  table,  but  merely  describe  its  format.  You  need  not  let  an  address  space  call  itself 

recursively,  nor  need  you  check  for  this  condition. 
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This  document  specifies  the  test  data  which  the  routines  must  satisfactorily 
process  to  certify  them  as  debugged.  Mail  the  final  listings  of  your  test  programs 
along  with  printout  demonstrating  their  correct  operation  to  Sam  Fuller.  The  data  sets 
marked  * will  be  used  to  compute  the  S,  M and  R measures  on  the  ARF  simulator. 
Those  data  sets  marked  '*■  will  be  used  to  compute  these  measures  by  hand  in  the 
event  that  the  simulator  is  not  available  in  time.  Where  the  program  specifications 
provide  pointers,  this  document  gives  the  values  pointed  at  by  the  pointer,  where 
appropriate.  This  is  only  for  convenience  and  does  NOT  mean  the  parameters  should  all 
be  passed  as  these  values  rather  than  as  pointers. 

A.  I/O  kernel.  Four  Priority  Levels. 

Test  scenario;  An  interrupt  is  received  on  the  lowest  priority  level.  While  this 
interrupt  is  being  processed  a second  interrupt  is  received  on  the  highest  level. 
Processing  of  the  lower  priority  interrupt  is  intervened  by  the  higher  priority 
interrupt.  Processing  then  continues  until  both  interrupts  have  been  processed,  and 
the  machine  restored  to  its  pre-interrupt  state. 

B.  I/O  Kernel,  FIFO  Processing. 

An  interrupt  is  received  on  one  of  the  four  devices.  While  this  interrupt  is  being 
processed,  another  is  received  on  another  device.  The  second  interrupt  is  queued,  and 
processing  continues  until  both  interrupts  are  processed,  and  the  machine  restored  to 
its  pre-interrupt  state. 

C.  I/O  Device  Handler. 

The  handler  is  called  with  the  following  inputs: 

Device:  tape 

Op:  0 (read) 

No-trans:  2 

Buffer:  addrl,  addr2,  0, ... 

Length:  120,120,0,... 

Status:  0 

Working-Storage:  pointer 


The  handler  then  sets  up  the  two  reads,  and  performs  the  necessary  processing 
to  perform  the  reads  and  then  returns  the  completion  status  for  the  reads. 
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D.  Large  FFT.  (See  attached  appendix  for  illustrations  o, 
should  transform  the  data,  i.e.  the  X vector}. 

In  the  following  tests,  the  W vector  will  be  as  follows: 

W(0)  =(1.0, 0.0) 

W(l)  =(0.0245412,0.9996988) 

W(2)  »W(1)2 

W(128)=  W(1)128  ^ (_i  o,  0.0) 


*1)  Test  case:  Step  Function.  Parameters  are: 


i.  X: 

X(0)  =(1.0,  0.0) 

X(l)  =(1.0, 0.0) 

X(127)  = (1.0,  0.0) 
X(128)  = (0.0,  0.0) 

ii.  W; 

iii.  N: 

iv.  WORK 

X(255)  = (0.0,  0.0) 
W vector  above 

256 

2)  Test  case:  t 

sin  rts.  Parameters  are: 

1.  X: 

X(0)  = (0.0,  0.0) 

X(l)  =(0.0, 0.0) 

X(63)  = (0.0,  0.0) 
X(64)  =(-1.0, 0.0) 
X(65)  = (0.0,  0.0) 

X(127)  =(0.0,  0.0) 
X(128)  = (1.0,  0.0) 
X(129)  =(0.0,  0.0) 

• 

X(191)  = (0.0,  0.0) 
X(192)  =(-1.0,  0.0) 
X(193)  -(0.0,  0.0) 

ii.  W: 

iii.  N: 

X(255)  = (0.0,  0.0) 
W vector  above 

256 

iv.  WORK 

’*’3)  Test  case:  Small  Step  Function.  Parameters  are: 


how  the  FFT  procedure 


» (cos  n/128,  sin  n/128) 


1 


i 
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i.  X:  X(0)  -(1.0, 0.0) 

X(l)  -(1.0, 0.0) 
X{2)  - (0.0,  0.0) 

X(3)  - (0.0,  0.0) 

X(4)  -(1.0, 0.0) 


ii.  W:  W(0)  -(1.0, 0.0) 

W(l)  - (0.3827,  0.9239)  - (cos  n/8,  sin  n/8) 
W(2)  -W(l)2 


W(4)  - W(l)^-(0.0,  1.0) 


W(8)  - W(l)®  - (-1.0,  0.0) 

iii.  N:  16 

iv.  WORK 


E.  Character  Search. 

1)  Test  case:  Find  match.  Parameters  are: 

i.  SRCH-STR:  "Monday,  June  7th,  1976" 

ii.  SRCH-LNGTH:  22 

iii.  SRCH-ARG:  "day" 

iv.  ARG-LNGTH:  3 

V.  LOC:  Expected  return  value  - 3 

vi.  WORK 

**2)  Test  case:  No  match.  Parameters  are: 

i.  SRCH-STR:  "Carnegie-Mellon  U" 

ii.  SRCH-LNGTH:  17 

iii.  SRCH-ARG:  "CMU" 

iv.  ARG-LNGTH:  3 

V.  LOC:  Expected  return  value  - -1 

vi.  WORK 

■*‘3)  Test  case:  Match  at  attempt  n (n>  1).  Parameters  are; 

i.  SRCH-STR:  "Day  in.  Day  out" 

ii.  SRCH-LNGTH:  15 

iii.  SRCH-ARG:  "Day  out" 

iv.  ARG-Li'lGTH:  7 

V.  LOC:  Expected  return  value  - 8 

vi.  WORK 

4)  Test  case:  Match  at  beginning  of  string.  Parameters  are: 
i.  SRCH-STR:  "abed" 
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iii.  SRCH-ARG: 

"ab" 

iv.  ARG-LNGTH; 

2 

V.  LOC: 
vi.  WORK 

Expected  return  value  • 0 

5)  Test  case:  Fail  match  at  end  of  string.  Parameters 

i.  SRCH-STR: 

"efgh" 

ii.  SRCH-LNGTH: 

4 

iii.  SRCH-ARG: 

"hx" 

iv.  ARG-LNGTH: 

2 

V.  LOC; 

Expected  return  value  = -1 

vi.  WORK 

6)  Test  case:  Match 

first  occurence.  Parameters  are; 

i.  SRCH-STR; 

"Day  in,  Day  out" 

ii.  SRCH-LNGTH: 

15 

iii.  SRCH-ARG; 

"Day" 

iv.  ARG-LNGTH; 

3 

V.  LOC; 
vi.  WORK 

Expected  return  value  *=  0 

7)  Test  case;  Deal  with  0 length  search  string.  Parameters  are 


i.  SRCH-STR: 

ii.  SRCH-LNGTH: 

iii.  SRCH-ARG; 

iv.  ARG-LNGTH: 
V.  LOG: 

vi.  WORK 


"Day  in,  Day  out" 

0 

"Day" 

3 

Expected  return  value  = -1 


8)  Test  case:  Deal  with  0 argument  search  string.  Parameters  are 


i.  SRCH-STR:  "Day  in,  Day  out" 

ii.  SRCH-LNGTH:  15 

iii.  SRCH-ARG;  "Day" 

iv.  ARG'LNGTH:  0 

V,  LOC:  Expected  return  value  * 0 

vi.  WORK 

*9)  Test  case:  Match  at  attempt  n (n  > 1).  Parameters  are; 

i SRCH-STR:  "Saving  hot  water  saves  gas.  So  take  shorter  showers, 

or  run  a little  less  hot  water  in  the  tub. 

Do  full  loads  in  your  dishwasher  and  washing  machine. 
Fix  leaky  faucets.  It  helps  to  keep  your  gas  water 
heater  at  the  normal  setting  or  lower." 

Note;  Each  line  break  in  the  above  string  is  a space. 
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ii.  SRCH-LNGTH:  240 

ill.  SRCH-ARG:  “water  heater" 

iv.  ARG-LNGTH:  12 

V.  LOG:  Expected  return  value  *196 

Vi.  WORK 


F.  Bit  Test,  Set,  or  Reset. 

All  tests  given  here  count  from  the  left  hand  end  of  the  string.  However,  the 
results  expected  below  (i.e.  returned  code)  will  be  the  same  for  bits  counted  in  either 
direction.  The  programmer  must  verify  the  results  of  a routine  which  counts  bits  from 
the  low  order  end  of  the  word.  The  following  bit  string  is  used  in  all  of  the  next  six 
tests:  Consider  the  bits  to  be  grouped  in  16  bit  words: 

1010011100101111  0111001010011100  1110111000000110  1100101  non  1000 


1)  Test  case;  Test  1 bit.  Parameters  are: 

i.  F: 

ii.  N: 

iii.  Al: 

iv.  RC: 

V.  WORK 

2)  Test  case:  Test  0 bit.  Parameters  are: 

i.  F: 

ii.  N: 

iii.  Al: 

iv.  PC: 

V.  WORK 


*■*■3)  Test  case:  Set  0 bit.  Parameters  are: 


i.  F: 

2 

ii.  N: 

21 

iii.  Al; 

Bit  string  above 

' 

( 

i 

f 

iv.  RC; 

Expected  return  value:  0 

Also  expect:  Word  1 ■01110^1010011100 

V.  WORK 

4)  Test  case:  Set  I bit.  Parameters  are: 


i. 

F: 

2 

ii. 

N: 

34 

iii. 

Al: 

Bit  string  above 

iv. 

RC: 

Expected  return  value:  1 

V. 

WORK 
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35 

Bit  string  above 
Expected  return  value:  0 


1 

19 

Bit  string  above 
Expected  return  value:  1 
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5)  Test  case:  Clear  1 bit.  Parameters  are: 


i.  F: 

3 

ii.  N: 

0 

iii.  Al: 

Bit  string  above 

iv.  RC: 

Expected  return  value:  1 

Also  expect:  Word  0 ■ 001001 1 100101 1 10 

V.  WORK 

**6)  Test  case:  Clear  0 bit.  Parameters  are: 


F: 

3 

ii. 

N: 

16 

iii. 

Al: 

Bit  string  above 

iv. 

RC: 

Expected  return  value:  0 

V. 

WORK 

G.  Runge-Kutta  Integration. 

*■*■1)  Test  case:  Accuracy.  Parameters  are: 

i.  to:  0.0 

ii.  yo'-  0.0 

iii.  h:  0.0078125 

*max’ 

V.  y:  Expected  return  value-.  221&S 

vi.  WORK 

2)  Test  case:  tO  - Pai'a^'^eters  are; 

i.  to:  1.0 

ii.  yo:  0.0 

iii.  h:  0.01 

iv.  t^3^:  1.0 

V.  y:  -10.0  (Initial  value)  Expected  return  value;  0.010100333... 

vi.  WORK 

*'*’3)  Test  case:  Straight  line.  Parameters  are: 

i.  to:  0.0 

ii.  yo:  -1 

iii.  h:  0.5 

^max’ 

V.  y:  Expected  return  value;  -3 

vi.  WORK 
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i 

1 

H.  Linked  List  Insertion.  1 

1)  Test  case:  Insert  fust  key  in  empty  list.  Parameters  are:  j 

1 

i.  LIST-CB:  Contains:  (0, 0, 0)  ! 

ii.  NCW-ENTRY:  (111,0,0)  i 

2)  Test  case:  Insert  at  head  of  list.  Parameters  are: 

i.  LIST-CB:  Contains:  I,  1) 

ii.  NEW-ENTRY:  ( 45, 0, 0)  ' ! 

3)  Test  case:  Append  to  list.  Parameters  are: 

i.  LIST-CB:  Contains:  (?,  ?,  2)  i 

ii.  NEW-ENTRY:  (150,0,0) 

4)  Test  case:  Insert  in  middle  of  list.  Parameters  are: 

i.  LIST-CB:  Cofitairvs:  (?,  ?,  3) 

ii.  NEW-ENTRY:  (100,0,0)  ! 

5)  Test  case:  DapUcaie  key  at  head  of  list.  Parameters  are: 

i.  LIST-CB:  Contains:  (?,  ?,  4) 

ii.  NEW-ENTRY:  ( 45,  0,  0) 

6)  Test  case:  Duplicate  key  at  tail  of  list.  Parameters  are: 

i.  LIST-CB:  Contains:  (?,  ?,  5) 

ii.  NEW-ENTRY:  (150,  0,  0) 

7)  Test  case:  Duplicate  key  in  middle  of  list.  Parameters  are: 

i.  LIST-CB:  Contains:  (?,  ?,  6) 

ii.  NEW-ENTRY:  (111,0,0) 

*■*■8)  Test  case:  Insert  in  middle  of  List.  Parameters  are: 

i.  LIST-CB:  Contains:  (?, 7) 

ii.  NEW-ENTRY:  ( 75,  0,  0) 

Note:  As  a second  sequence  of  8 tests  use  the  keys  in  reverse  order:  75,  111, 

150,  45,  100,  150,  45,  111. 


1.  Quick  Sort. 

*1)  Test  case:  General  sort.  Parameters  are: 


B-7 


Test  Program  Specifications 


i.  N: 

ii.  REC: 

iii.  M: 

iv.  WORK 


30 

See  table  below 
6 


Note:  When  sorted,  the  numbers  in  positions  0 
ascending  order  cf  magnitude. 


“ 0 *iV****** 

"48  mcw30++++++++ 
"3G  j j k I VVf 
"40  j sgcz++++++++ 
" 1 04  ym  I j ZiVVnVjVi’nViViV 
" 1 lG2kbuk++++++++ 
" 5G  qtn  f zyVf**Vfirfi'f ** 
"G0  qocwf++++++++ 
"28  i uyunvn'ovvwn’ov* 
"52  qggg ]++++++++ 
" 1 12zhrhb*')V*'>v**** 
" 124ZZZZZ++++++++ 


'24  frktb++++++++ 

'88  vbvbsi'tVnViV**** 
'4  aercr++++++++ 

'G4  roOKCiWfVn'nVVnVVf 
'92  vcobr++++++++ 
'80  tZUk  tVo'nVVoVVo'oV 

'84  uwbyb++++++++ 
"72  SohrgVri'n'n'n'o'nMf 
"IG  csqcw++++++++ 
" 1 BSyqmbaVfi’nViVVWn'nV 


3 of  each  record  will  be  in 


'G8  savwu+++++++i- 
' 1 20ZOUptniTr>ViV***iViV 
' 1 00X  j vgc++++++++ 
' 7G  t a s g ovnvivvnv iviv* 
'32  jhkpx++++++++ 

'44  I d VU  j in'n'riiitin'tit 
'20  cydhp++++++++ 
'12  bwh  ( Ci'o’oVVn’n'n'nV 
'9G  vrkzui++++++++ 
"8  a I VOZVrfnVVnVVo'nV 


*2)  Test  case:  General  sort.  Parameters  are: 


i.  N: 

ii.  REC: 

iii.  M: 

iv.  WORK 


See  table  below 
2 


"0  ++++-(*+++'  , 

"48  mcuisoVfVfi'f ***■>'<*" , "24  frktbiVvVVnViVi'n’nv" , "G8  savuuvwnvvoviv**" , 
"3G  j jk I V++++++++" , "88  vbvbs++++++++" , "120zoupm++++++++" , 
" 1 24zzzzzVf*>v***>v*" , 


J.  ASCII  to  Floating  Point  Conversion  Routine. 

*‘*'1)  Test  case:  Ordinary  (representable)  F.P.  number.  Parameters  are: 

i.  N:  7 

ii.  Al:  "10.0625" 

iii.  A2:  Expect:  (#041041  #000000  on  POP  11) 

Expect:  (?  on  IBM  370,  Interdata  8/32) 

2)  Test  case:  No  <integer>  part.  Parameters  are: 

i.  N:  5 

ii.  Al:  ".0625" 

iii.  A2:  Expect:  (#037200  #000000  on  POP  1 1) 

Expect:  (?  on  IBM  370,  Interdata  8/32) 
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3)  Test  case:  No  <fra.ction>  part.  Parameters  are; 

i.  N:  3 

ii.  Al:  "10." 

A2:  Expect:  (*0‘tl040  *000000  on  PDP  11) 

Expect:  (?  Or  IBM  370,  Interdata  8/32) 

4)  Test  case:  Ordinary  (non-represvntable)  FP.  number.  Parameters  are: 

i.  N:  4 

ii.  Al:  "0.0  r 

A2;  Expect:  (*036413  *153412  on  PDP  11) 

Expect:  (?  on  IBM  370,  Interdata  8/32) 

5)  Test  case:  Ordinary  r.P,  number.  Parameters  are; 

i.  N:  9 

ii.  Al:  "0.5009766"  (=  2'^^  + 0.5) 

iii.  A2:  Expect:  (#040000  #040000  on  PDP  11) 

Expect:  (?  on  IBM  370,  Interdata  8/32) 

6)  Test  case:  Optional  sign  <->.  Parameters  are: 

i.  N;  4 

ii.  Al;  "'2.0" 

iii.  A2:  Expect:  (#140400  #000000  on  PDP  11) 

Expect:  (?  on  ISM  370,  Interdata  8/32) 

7)  Test  case:  Optional  sign  <*>.  Parameters  are: 

i.  N:  4 

ii.  Al:  "+2.0" 

iii.  A2:  Expect:  (#040400  #000000  on  POP  1 1) 

Expect:  (?  on  IBM  370,  Interdata  8/32) 

8)  Test  case:  Floating  point  0.  Parameters  are: 

i.  N:  3 

ii.  Al:  "0.0" 

iii.  A2:  Expect:  (#000000  *000000  on  PDP  11) 

Expect;  (?  on  IBM  370,  Interdata  8/32) 


K.  Boolean  Matrix  Transpose. 

■’’I)  Test  case;  Array  starting  off  a word  boundary.  Parameters  are: 

i.  N:  4 

ii.  Al: 

110  1 1111 
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1 e 1 1 10  0 0 

1000  0100 

10  0 1 110  1 

iii.  A2:  4 

*2)  Test  case:  Array  > I word  in  length.  Parameters  are: 

i.  N:  9 

ii.  Al: 


0 

0 

1 

0 

1 

0 

0 

1 

1 

0 

1 

1 

0 

1 

0 

1 

0 

1 

1 

0 

0 

1 

0 

1 

1 

1 

0 

0 

0 

1 

1 

1 

0 

0 

0 

0 

1 

1 

1 

0 

0 

0 

0 

1 

0 

1 

0 

1 

1 

0 

1 

1 

0 

0 

0 

1 

1 

1 

0 

0 

1 

0 

1 

0 

1 

0 

1 

1 

0 

1 

0 

1 

1 

1 

0 

1 

1 

1 

0 

0 

0 

.>  1 

0 

0 

0 

1 

1 

1 

0 

1 

0 

0 

1 

0 

1 

1 

1 

0 

0 

0 

1 

0 

0 

1 

1 

0 

1 

1 

1 

0 

1 

1 

1 

0 

1 

1 

1 

0 

1 

0 

1 

0 

1 

1 

0 

1 

0 

0 

0 

0 

0 

1 

0 

1 

0 

1 

1 

1 

0 

0 

0 

1 

1 

1 

1 

0 

0 

1 

1 

1 

1 

1 

0 

1 

0 

0 

1 

0 

0 

1 

0 

0 

iii.  A2:  0 

3)  Test  case:  One  dimensional  array.  Parameters  are: 

i.  N:  1 

ii.  Al: 

10  10 

10  10 

iii.  A2:  4 


L.  Virtual  Memory  Space  Exchange. 

WiU  not  be  run  on  actual  machines. 


( 


Appendix  C:  Calling  Sequence  Conventions 

This  appendix  assumes  knowledge  of  the  instruction  sets  of  the  candidate  architectures. 


1 

I 


In  order  to  debug  the  benchmark  programs,  it  was  necessary  to  write  driver  programs  to 
call  the  benchmarks  as  subroutines.  This  made  it  necessary  to  establish  conventions  for  the 
calling  sequences  that  would  be  used  to  invoke  the  benchmarks. 

At  first  it  was  decided  to  use  the  calling  conventions  of  the  language  used  to  write  the 
drivers.  On  the  8/32  and  the  /360  the  drivers  were  written  in  FORTRAN;  on  the  11  it  was 
written  in  BLISS-11.  For  the  canonical  program 

RTN(Al,...An) 

the  calling  sequence  for  the  PDP-11  was 


nov 

address  of 

downuard-grouing  stack,RB 

• • • 

MOV 

A1,-(RB) 

; stack  value  of  first  argument 

MOV 

An,-(RB) 

JSR 

PC.RTN 

; push  program  counter 
; and  jump  to  subroutine 

On  entry  to  a subroutine,  register  6 pointed  to  a word  containing  the  address  of  the 
instruction  following  the  JSR.  The  address  of  the  last  argument  was  in  the  location  at  offset 
2 from  the  register  6,  the  next-to-last  at  offset  4,  and  so  on. 

For  the  IBM  360  the  calling  sequence  was 


L 

L 

L 

BALR 

13, -A (SAVE) 

1 . -A (ARBS) 
15,-A(RTN) 
14,15 

load  address  of  save  area 
load  address  of  argument 
get  address  of  routine  to 
jump  to  subroutine 

in  R13 
list  i n R1 
R15 

« « • 

DC 

A(A1)  full 

uord,  address  of  argument  1 

DC 

A(An) 

Register  14  contained  the  address  of  the  instruction  following  the  subroutine  call;  register  15 
contained  the  address  of  the  first  instruction  of  the  subroutine  and  could  be  used  as  a base 
register.  Register  1 contained  a pointer  to  a list  of  addresses  of  arguments.  The  address  of 
the  first  argument  was  in  the  word  pointed  to  by  register  1;  the  address  of  the  second 
argument  was  in  the  word  at  offset  4 from  the  word  pointed  to  by  register  1,  and  so  on. 
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Register  lo  pointed  at  the  base  of  an  area  used  to  store  the  general  purpose  registers.  The 
actual  convention  for  the  use  of  the  area  pointed  to  by  register  13  is  fairly  complicated,  and 
involves  chaining  together  pointers  to  such  save  areas.  It  was  stated  fairly  early  that  we  did 
not  want  to  impose  the  full  burden  of  this  convention  on  the  programmers.  They  could 
simply  consider  register  13  to  be  a pointer  to  a 16-word  block  where  they  could  save  the 
general  registers^. 

For  the  Interdata  8/32,  the  calling  sequence  was 

L 12, address  of  start  of  dounuard-grouing  stack 

• « • 

BAL  IS.RTN 

□C  X’<2*N+2>’ 

OAC  A1  address  of  first  argument 

• • • 

DAC  An 

next  instruction 

Register  12  points  to  the  top  of  a downward-growing  stack;  the  conventional  use  of  this 
stack  is  to  subtract  some  constant  from  the  stack  pointer  and  use  positive  offsets  from  the 
stack  pointer  as  temporary  storage^.  Register  15  contains  a pointer  to  the  halfword 
immediately  following  the  subroutine  call.  This  halfword  contains  information  relating  to  the 
number  of  parameters.  The  halfword  is  followed  by  a set  of  fullwords  containing  addresses 
of  parameters.  Because  the  instruction  and  the  following  halfword  of  information  need  only 
be  aligned  on  a halfword  boundary,  while  the  address  constants  must  be  aligned  on  a 
fullword  boundary,  the  following  two  situations  might  occur.  (A  vertical  bar,  |,  denotes  a 
fullword  boundary,  while  a colon,  ^ denotes  a halfword  boundary). 


: BAL 

1 DC 

: empty  1 

OAC  A1 

1 OAC 

A2 

• 

• 

1 BAL- 

: DC  1 

DAC  A1 

1 DAC 

A2 

*ln  th«  standard  calling  cenvsntlen,  ena  stores  tho  rogistsrs  starting  at  offset  12  fro*  register 
13  vie 

Sin  14,12.12(13) 


Unlike  on  the  POP-11,  this  is  a seftuars  eonventioni  on  the  POP-11,  the  register  6 stack  Is  used 
bu  the  hardware.  _ _ 
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In  both  cases,  register  15  points  to  the  DC  constant.  In  the  first  case,  the  first  address  is  at 
offset  2 from  register  15.  In  the  second,  it  is  at  offset  4.  It  is  necessary  for  the  program  to 
force  register  15  to  point  to  a consistent  place;  the  instruction 

OHI  15,2  OR  halfuord  immediate 

as  the  first  instruction  of  the  subroutine  forces  register  15  to  point  to  the  halfword 
immediately  preceding  the  "DAC  Al".  Thus  the  address  of  the  first  parameter  is  contained  in 
the  word  at  offset  2 from  register  15;  the  second  is  at  offset  6,  and  so  on.  It  was  later 
decided  that  the  existing  calling  sequences  were  unfairly  penalizing  some  of  the  machines 
whose  drivers  imposed  complicated  or  inappropriate  calling  sequences.  In  particular, 

a.  On  the  PDP-11,  it  was  usually  unnecessary  to  use  the  "work  area"  parameter,  as  the 
register  6 stack  provided  a work  area.  On  the  8/32,  the  software  stack  based  on 
register  12  served  a similar  function.  Clearly,  on  the  /370  it  would  be  possible  to 
implement  a software  stack  like  on  the  8/32;  it  seemed  unfair  to  penalize  the  /370  on 
the  basis  of  a software  convention. 

b.  On  the  PDP-11,  the  BLISS  driver  was  passing  the  values  of  parameters  rather  than 
their  addresses.  This  eliminated  a level  of  indirection  which  perhaps  unfairly  penalized 
the  /370  and  the  8/32,  which  lack  indirection. 


To  relieve  these  problems,  the  calling  sequences  were  modified.  Programmers  were 
permitted  to  use  the  area  pointed  to  by  register  13  as  their  work  area  on  the  /370.  All 
"input"  parameters,  those  examined  by  the  routines  but  not  overwritten,  were  passed  by 
value  rather  than  by  reference.  Those  parameters  specified  as  being  addresses  in  the 
program  specifications,  as  well  as  those  parameters  into  which  results  were  to  be  stored, 
were  still  passed  by  reference.  For  example,  the  CHARACTER  SEARCH  program,  E,  had  the 
parameters 

1)  SRCHSTR,  the  address  of  a character  string 

2)  SRCHLNG,  the  length  of  SRCHSTR 

3)  SRCHARG,  the  address  of  another  string 

4)  ARGLNG,  its  length 

5)  LOC,  a place  to  store  the  result 

6)  WORK,  a pointer  to  extra  working  storage 

Under  the  original  conventions,  the  argument  list  might  look  like 
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ARGLST 

DC 

A (SRCHSTR) 

DC 

A (SRCHLNG) 

DC 

A (SRCHARG) 

DC 

A (ARGLNG) 

DC 

A (LOO 

DC 

A (UORK) 

SRCHSTR 

• • • 

DC 

C’Search  string 

SRCHLNG 

DC 

F'13’ 

SRCHARG 

DC 

C’Arg’ 

ARGLNG 

DC 

F*3’ 

LOG 

DS 

F 

UORK 

DS 

• • • 

Under  the  new  conventions,  this  became 


ARGLST 

DC 

A (SRCHSTR) 

DC 

F’13’ 

DC 

A (SRCHARG) 

DC 

F’3’ 

DC 

A (LOC) 

DC 

A (UORK) 

SRCHSTR 

• • • 

DC 

C’Search  string 

SRCHARG 

DC 

C’Arg’ 

LOC 

OS 

F 

UORK 

DS 

• • • 
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I Appendix  D:  S,  M,  and  R Calculation  Sheets 

1, 

: , This  appendix  contains  the  actual  S,  M,  and  R measures  for  each  instruction,  addressing 

1 modes,  and  register  format  for  the  three  final  three  candidate  architectures.  The  instructions 

I preceding  the  calculation  sheets  assumes  the  computations  will  be  done  manually  but  in  fact 

I the  whole  process  was  automated.  If  test  programs  were  run  on  the  ISP  simulator  (which 

the  majority  were),  the  output  from  the  ISP  simulator  was  read  directly  by  the  summary 
calculation  program,  minimizing  the  chance  of  making  a clerical  error.  If  the  instruction 
counts  had  to  be  done  by  hand,  then  the  instruction  counts  were  typed  into  a disk  file  and 
then  the  file  was  input  to  the  summary  calculation  program. 

The  calculation  of  the  appropriate  M measure  for  each  instruction  and  each  addressing 
mode  is  reasonably  straightforward  and  fairly  obvious.  The  R measure  is  another  story, 
since  the  R measure  for  each  case  is  determined  by  effectively  microcoding  that  instruction 
and  addressing  mode  in  terms  of  the  anonical  microprocessor  described  in  Section  3.2.  Since 
this  amounts  to  programming,  two  different  microcoders  may  implement  the  same  instruction 
in  two  different  ways.  To  minimize  this  source  of  error,  a single  individual  computed  the  R 
j measures  for  all  three  final  candidates  (W.  E.  Burr),  making  every  effort  to  be  consistent 

across  the  three  architectures,  and  the  calculations  were  reviewed  by  the  respective 
architecture  chairmen. 

The  following  calculations  of  the  R measure  for  a basic  instruction  interpretation  cycle  and 
^ address  calculation  for  each  of  the  three  architectures  illustrate  the  approach  used  to 

determine  the  actual  R measures  given  here  in  this  appendix. 


I 

i 

I 

i 
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Instructions  for  Completing  the  IBM  S/370 
Calculation  Sheet 


To  assist  you  in  the  computation  of  the  S,  M,  and  R measures  for  the  IBM  370, 
we  have  included  here  a set  of  calculation  sheets  (8  pages)  and  a liberal  supply  of  work 
sheets.  The  work  sheets  are  designed  to  be  directly  attacheo  to  the  listing  of  your 
test  program  and  have  entries  for  each  line  of  your  program.  Use  as  many  work 
sheets  as  you  have  pages  of  code  in  your  listing.  The  calculation  sheets  will  be  used 
to  enumerate  S,  M,  and  R measures  via  addressing  modes  and  instruction  type. 

The  principle  use  of  the  work  sheets  will  be  to  get  accurate  counts  of 
instruction  executions.  The  calculation  sheets  will  be  used  to  determine  the  final  S,  M, 
and  R measures.  Later  we  will  indicate  how  to  obtain  these  measures  from  your  work 
sheets  directly,  but  this  will  be  used  primarily  as  a check  on  your  calculations. 

1.  You  will  need  to  refer  to  most  of  the  calculation  sheets  at  the  same  time,  so  you 
should  work  at  a desk  with  enough  space  to  spread  out  all  of  the  sheets  at  once. 
You  can  lay  aside  the  summary  calculation  sheet  (page  1)  since  you  will  not  need 
it  until  the  end.  You  will  also  need  the  CFA  Test  Data  Specifications  of  June  16, 
1976. 

2.  Attach  the  work  sheets  to  your  program.  The  work  sheets  have  a line  for  each 
line  of  your  program,  and  a column  for  the  instruction  type  (Mode),  the  number  of 
times  the  program  is  executed  (M),  the  storage  measure  (S),  the  memory  transfer 
measure  (M),  the  register  transfer  measure  (R),  and  the  products  N«M  and  N*R. 

3.  Go  through  your  program  to  determine  how  often  each  instruction  will  be 
executed,  using  the  test  data  items  marked  with  a plus  (+)  in  the  specifications. 
Fill  in  the  N column  of  the  work  sheet  with  these  numbers.  Where  there  is  more 
than  one  set  of  test  data  for  a particular  program,  determine  the  number  of  times 
each  instruction  is  executed  for  each  set  of  data,  add  the  results  together,  and 
write  the  sum  in  the  N column.  If  your  program  is  simple  enough,  you  may  be 
able  to  do  an  analysis  to  determine  these  numbers  in  terms  of  the  size  of  the  test 
data  and  some  simple  characteristics  (such  as  the  number  of  zeros  in  the  matrix 
for  the  boolean  matrix  transpose  program).  Failing  that,  you  may  need  to  hand- 
simulate  the  operation  of  your  program  on  the  test  data. 

A.  Perform  steps  5 to  12  for  each  instruction  in  the  program.  Each  step  is  labeled 
with  the  name  of  the  sheet  on  which  the  number  calculated  in  the  step  is  to  be 
written. 

5.  work  sheet:  Determine  the  type  of  the  instruction  (RR,  RX,  SI,  RS,  or  SS)  and  mark 
the  typo  in  the  Mode  column  of  the  work  sheet. 

6.  calculation  sheet:  Find  the  row  of  the  Instruction  Format  Table  (page  2) 
corresponding  to  the  type  determined  in  step  5.  Add  one  to  the  "number  of 
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static  occurrences"  column  of  that  row,  and  add  the  number  in  the  N column  of 
the  work  sheet  to  the  "no.  of  times  executed"  column  of  the  row. 

7.  work  sheet:  Copy  the  number  from  the  "S/INST"  column  of  the  row  in  step  6 to 
the  S column  of  the  work  sheet. 

8.  calculation  sheet:  For  each  effective  address  calculation  in  the  instruction,  find  the 
appropriate  row  in  the  Effective  Address  Calculation  table  (page  2).  RR 
instructions  have  no  effective  address  calculations,  RX,  RS,  and  SI  have  one 
effective  address  calculation,  and  SS  have  two.  The  table  has  a separate  row  for 
pure  displacement  (no  registers),  base  plus  displacement  (one  register),  and  base, 
displacement,  and  index  (two  registers).  Add  one  to  the  "no.  of  times  executed" 
column  of  the  row. 

9.  calculation  sheet:  Find  the  row  of  the  Instructibn  Table  (pages  3 to  7) 
corresponding  to  the  instruction.  Add  the  number  in  the  N column  of  the  work 
sheet  to  the  "no.  of  times  executed"  column  of  this  row.  If  the  "R/INST"  and 
"M/INST"  columns  of  this  row  are  blank,  mark  the  row  (preferably  in  a colour  that 
stands  out),  write  zeros  in  these  two  columns,  and  on  a separate  sheet  make  note 
of  the  fact  that  the  instruction  was  missing. 

10.  work  sheet:  Add  the  number  in  the  "R/INST"  column  of  the  row  found  In  step  9, 
the  number  of  the  "R/CALC"  column  of  the  row  found  in  step  8,  and  the  number  in 
the  "R/INST"  column  of  the  row  found  in  step  6.  Write  this  sum  in  the  R column 
of  the  work  sheet. 

11.  work  sheet:  Add  the  number  in  the  "M/INST"  column  of  the  row  found  in  step  9 
and  the  number  in  the  "M/INST"  column  of  the  row  found  in  step  6.  Write  this 
sum  in  the  M column  of  the  work  sheet. 

12a.  calculation  sheet;  If  the  entry  in  the  Instruction  Table  (found  in  step  9)  has  an 
asterisk  (♦),  the  M and  R measures  depend  on  some  other  count  associated  with 
.the  instruction.  For  example,  a STM  (store  multiple)  instruction  costs  A bytes  in 
the  M measure  for  every  register  stored.  For  each  such  starred  instruction,  look 
up  its  entry  in  the  "Miscellaneous  M and  R table"  (page  8),  calculate  the 
appropriate  "what  to  count"  number,  multiply  by  the  entry  in  the  N column  of  the 
work  sheet,  and  add  this  number  to  the  "count"  column  of  the  table. 

12b.  work  sheet:  Multiply  the  "what  to  count  number"  determined  in  step  12a  by  the 
number  in  the  "M/COUNT"  entry  used  in  that  step,  and  add  this  product  to  the  M 
column  of  the  work  sheet. 

When  you  have  completed  these , calculations  for  the  entire  program,  compute 

the  totals  in  the  tables  as  outlined  in  steps  13  to  19. 

13.  calculation  sheet:  For  each  entry  of  the  Instruction  Table  (page  3 through  7), 
multiply  the  number  in  the  "no.  of  times  executed"  column  by  the  number  in  the 
"M/INST"  column  and  write  the  product  in  the  "M  subtotal"  column.  Multiply  the 
number  in  the  "no.  of  times  executed"  column  by  the  number  in  the  "R/INST" 
column  and  write  the  product  in  the  "R  subtotal"  column. 
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14.  calculation  sheet:  For  each  page  of  the  Instruction  table,  calculate  the  subtotals  of 
the  "number  of  times  executed",  "M  subtotal",  and  "R  subtotal"  columns,  fill  in  the 
subtotals  at  the  bottom  of  the  page  (Cpj,  Mpj,  Rpj  for  instruction  page  1,  and  so 
on),  and  write  these  subtotals  in  the  space  provided  on  the  Summary  Calculation 
page  (page  I). 

15.  calculation  sheet:  On  the  Instruction  Format  Table  (page  2),  add  up  the  "S 
subtotals",  "number  of  times  executed",  "M  subtotal",  and  "R  subtotal"  columns  to 
give  Sj,  Cj,  Mj,  and  Rj.  Fill  in  the  corresponding  entries  on  the  Summary 
Calculation  page. 

16.  calculation  sheet:  On  the  Effective  address  Calculation  table,  add  up  the  columns 
giving  Cggj.  and  Rggc’  copy  these  to  the  summary  page. 

1 7.  calculation  sheet:  Add  up  the  entries  on  the  summary  page  to  give  the  S,  M,  and  R 
measures. 

18.  calculation  sheet:  Perform  the  consistency  checks  given  on  the  summary  page. 
You  should  get  Cj  equal  to  Cpj  + ...  + Cp^,  and  Cgg^  equal  to  Cpj^  + ^^SS' 

19.  work  sheet:  Multiply  the  N column  of  the  work  sheet  by  the  M column  and  write 
the  result  in  the  NM  column.  Multiply  the  N column  by  the  R column  and  write  the 
result  in  the  NR  column.  Calculate  the  sum  of  the  NM  column;  this  should  be  the 
same  as  the  final  M measure.  Similarly,  the  sum  of  the  NR  column  should  be  the 
same  as  the  final  R measure. 
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INSTRUCTION  TABLE 


Tar-  •'n,  1976 
'Note:  0 :=  reroT 


lEM  3/370  CALCnyiTION 
Test  Prograir. : 

Progrittner  • 

■/a  re  • 


INSTRUCTION 


(ALPHABETIZED  BY  MNEMONIC) 


NO.  TIMES  I ; I 

EXECUTED  m/iNST  Im  SUBTOTAL  R 'iNST  R SUBTOTAl 


AD  ADD  NORMALIZED  (long) 


ADR  ADD  NORMALIZED  (Ion 


ADD  HALTOORD 


ADD  LOGICAL 


ADD  LOGICAL 


ADD  DECIMAL 


D 


ADD  UNNORMALIZED 


ADD  UNNORMALIZED  (short) 


ADD  LTv'NORILALIZED  . 


B 

BALR 

3C 

BCR 


BCT 


BRANCH  AND  LINK 


BRANCH  AND  LINK 


_bra:^ch  and  link 

BRANQt  ON  CONDITION 

BRANCH  ON  CONDITION  ! 


BRANai  ON  COUNT 

BRANCH  ON  COUNT  


BRANCH  ON  COUNT 


BRANCH  ON  INDEX  HIGH 


BRANCH  ON  INDEX  LC'.»’  OR  EQUA 


Ca'PARE 


COMPARE  do 


BEST  AVAlWBli  COPY 


-I'.iv.f  ’0.  i 


Ti  jC  ir  ^ -2- 
Proi;'arT;;v:r  • _ 

Oate: 


June  30,  1976 


IBM  s/370  CALCULATION  SHEET 

Test  Program: 

Progrannie  r : 

Date: 


INSTRUCTION 


INO.  TIM^S 

Iexecuted 


M/iNST  I M SUBTOTAL  1 r/tnst  i R SUBTOTAL 


LOAD  HALFIJORD 


LOAD  MIT-TIPLE 


LOAD  negative:  (Ion 


LOAD  NEGATIVE  (short 


LOAD  NE 


LOAD  positive  (Ion 


LOAD  POSITIVE  (short 


LPR 

LOAD  POSITIVE 

LPSW 

LOAD  PSW 

LR 

LOAD 

LRA 

LOAD  REAL  ADDRESS 

LRDR 

LOAD  ROUNDED  (extended  to 
long) 

LRER 

LOAD  ROUNDED  (long  to  short! 

LTDR 

LOAD  AND  TEST  (long) 

LOAD  AND  TEST  (short 


LOAD  A 


M 

MULTIPLY 

MC 

MONITOR  CALL 

MD 

MULTIPLY  (long) 

MULTIPLY  (Ion 


MULTIPLY  (short  to  long! 


MULTIPLY  (short  to  long) 


MULTIPLY  HALFWORD 


MULTIPLY  DECIMAL 


MOVE  (character) 


MOVE  LONG 


MOVE  (immediate. 


MOVE  NUMERICS 


MOVE  ZONES  


MULTIPLY  (long  to  extended 


MULTIPLY  (Ion 


MULTIPLY  (extended 


AND 


Page  Subtotals 
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IBM  S/370  CALCULATION  SH££T 
Teat  Program: 

Progrananer : 

Date: 


MNT- 

MONIC-S 


SLT  CLOCK  COMP.i:j\TOK 


:-i Ui^TRACT  VOK>ULt::::D  (ion 


SUBTRACT  NORMALIICD  (Ion 


siT.TPA-CT  ::ok>'.A!.i2e: 
SULTILICT  NOiXALICr. 


START  l/o 


STAR' 


SUBTRACT  LOCIC.\ 


SLA  SHIFT  LEFT  S INGL! 

SLDA  SHIFT  LEFT  DOL^LE  i 

SUI':'T  I.FFT  Dnin'I.F 


SHIFT  LEFT  SINCLE  LOGIOIL 


SURTR^XCT  LOCICAL  


SIRiTRACT  DEciyAL 


SET  RSW  lEli  FRO.':  ADDRESS 


SET  PROGRAM  MASK 


SET  c: 

• ppr  r T 


SHIFT  FIGHT  S INGLE 


June  30,  ’976 


IBM  s/370  CALCULATION  SHEET 

Test  Program; 

Prog  ranine  r : ________ 

Date: 


INSTRUCTION 


Page  Subtotals 
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Instructions  for  Completing  the  DEC  PDP-11 
Calculation  Sheet 


To  assist  you  in  the  computation  of  the  S,  M,  and  R measures  for  the  DEC  PDP- 
11,  we  have  included  here  a set  of  calculation  sheets  (8  page'^)  —■d  a liberal  supply  of 
work  sheets.  The  work  sheets  are  designed  to  be  dir<-'-ti-.'  att?c‘'=d  lo  t^e  listing  of 
your  test  program  and  have  entries  for  each  line  of  your  program.  Use  as  rr.any  work 
slieets  as  you  have  pages  of  code  in  your  listing.  The  calculation  sheets  wiil  be  used 
to  enumerate  S,  M,  and  R measures  via  adaressing  mooes  and  instruction  type. 

The  principle  use  of  the  work  sheets  will  be  to  get  accurate  counts  of 
instruction  executions.  The  calculation  sheets  will  be  used  to  determine  the  final  S,  M, 
and  R measures.  Later  we  wiH  indicate  how  to  obtain  these  measures  from  your  work 
sheets  directly,  but  this  will  be  used  primarily  as  a check  on  your  calculations 

1.  You  will  need  to  refer  to  most  of  the  calculation  sheets  at  *‘'e  same  t'me,  so  you 
should  work  at  a desk  with  enough  space  to  spread  out  a!!  of  the  sheets  at  once. 
You  can  lay  aside  the  summary  calculation  sheet  (page  1)  since  you  will  not  need 
it  until  the  end.  You  will  also  need  the  CFA  Test  Data  Soei 'iic ations  o.  June  16, 
1976. 

2.  Attach  the  work  sheets  to  your  program.  The  work  sheets  have  a line  for  each 
line  of  your  program,  and  a column  for  fhe  instruction  type  (Mode),  the  number  of 
times  the  program  is  executed  (N).  the  storage  measure  (S).  the  memory  transfer 
measure  (M),  the  register  transfer  measure  ^P)  and  the  products  N»M  and  N*R. 

3.  Go  through  your  program  to  determine  how  often  each  instruction  will  be 
executed,  using  the  test  data  items  marked  with  a plus  (♦)  in  the  specifications. 
Fill  in  the  N column  of  the  work  sheet  with  these  numbers.  Where  there  is  more 
than  one  set  of  test  data  for  a particular  program,  determine  the  number  of  times 
each  instruction  is  executed  for  each  set  of  data,  add  the  results  together,  and 
write  the  sum  in  the  N column.  If  your  program  is  simple  enough,  you  may  be 
able  to  do  an  analysis  to  determine  these  numbers  in  terms  of  the  size  of  the  test 
data  and  some  simple  characteristics  (such  as  the  number  of  zeros  in  the  matrix 
for  tlie  boolean  matrix  transpose  program).  Failing  that,  you  may  need  hand- 
simulafe  the  operation  of  your  program  on  the  test  data. 

4.  Perform  steps  5 to  1 1 for  each  instruction  in  the  program.  Each  step  is  labeled 
with  the  name  of  the  sheet  on  which  the  number  calculated  in  the  step  is  to  be 
written. 

5.  work  sheet:  Determine  fhe  type  of  the  instruction  (d=double  operand,  S“single 
operand,  r=register-t-operand,  f»floating  point,  m-miscellaneous)  and  mark  the 
type  in  the  Mode  column  of  fhe  work  sheet. 

6.  calculation  sheet;  Find  the  S measure  table  on  page  2.  Add  one  to  the  "count" 
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column  of  the  "instruction"  row.  For  each  immediate  or  absolute  operand  (modes 
2 and  3 using  the  PC)  add  one  to  the  "count"  column  of  the  "immediate  and 
absolute"  row.  For  each  indexed  and  indexed  deferred  operand,  add  one  to  the 
"indexed  operands"  row. 

7.  worK  sheet:  For  each  "one"  added  to  the  S measure  table  in  step  6,  count  two, 

and  add  the  resulting  number  to  the  S column  of  the  work  sheet.  This  gives  the 

number  of  bytes  used  for  the  instruction. 

8.  calculation  sheet:  For  each  operand  in  the  instruction,  find  the  appropriate  row  in 
the  Address  Mode  table  (page  2).  Note  that  there  are  separate  entries  for  byte 
and  word  instructions  for  modes  2 and  A (autoincrement  and  autodecrement).  Add 
one  to  the  "no.  of  times  executed"  column  of  the  row. 

10.  calculation  sheet;  Find  the  row  of  the  Instruction  Table  (pages  3 to  10) 

corresponding  to  the  instruction.  The  instructions  are  grouped  into  tables  based 

on  the  type  determined  in  steo  5.  Those  instructions  which  have  6-bit  operand 
fields  have  two  or  three  sections.  One  labeled  "basic",  and  the  others  labeled 
"source  mode  > 0",  "dest  mode  > 0",  or  "address  mode  > 0".  Add  the  number  in 
the  N column  of  the  work  sheet  to  the  "basic"  field  of  the  "no.  of  times  executed” 
column  of  this  row.  For  double  operand  instructions,  add  N to  the  "source  mode  > 
0"  entry  if  the  addressing  mode  of  the  source  operand  is  not  register  mode  (mode 
0),  and  add  N to  the  "dest  mode  > 0"  entry  if  the  addressing  mode  of  the 
destination  operand  is  not  register  mode.  The  single  operand,  register  operand, 
and  floating  point  instructions  should  be  treated  similarly.  If  the  "R/INST"  and 
"M/INST“  columns  of  this  row  are  blank,  mark  the  row  (preferably  in  a colour  that 
stands  out),  write  zeros  in  these  two  columns,  and  on  a separate  sheet  make  note 
of  the  fact  that  the  instruction  was  missing. 

10.  work  sheet:  Add  the  number  in  the  "R/INST"  column  of  the  row  found  in  step  9 
and  the  number  in  the  "R/MODE"  column  of  the  row  found  in  step  8.  Write  this 
sum  in  the  R column  of  the  work  sheet. 

11.  work  sheet:  Add  the  number  in  the  "M/INST"  column  of  the  row  found  in  step  9 
and  the  number  in  the  "M/MODE"  column  of  the  row  found  in  step  8.  Write  this 
sum  in  the  M column  of  the  work  sheet. 

When  you  have  completed  these  calculations  for  the  entire  program,  compute 

the  totals  in  the  tables  as  outlined  in  steps  12  to  18. 

12.  calculation  sheet:  For  each  entry  of  the  instruction  tables  (page  3 through  10), 
multiply  the  number  in  the  "no.  of  hmes  executed"  column  by  the  number  in  the 
"M/INST"  column  and  write  the  product  in  the  "M  subtotal"  column.  Multiply  the 
number  in  the  "no.  of  times  executed"  column  by  the  number  in  the  "R/INST" 
column  and  write  the  product  in  the  "R  subtotal"  column.  Treat  the  "basic"  and 
"mode  > 0"  entries  for  each  instruction  as  separate  entries  for  the  purposes  of 
this  calculation. 

13.  calculation  sheet:  For  each  page  of  the  instruction  tables,  calculate  the  subtotals 
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of  the  "number  of  times  executed",  "M  subtotal",  and  "R  subtotal"  columns,  fill  in 
the  subtotals  at  the  bottom  of  the  page,  and  write  these  suDtotals  in  the  space 
provided  on  the  Summary  Calculation  page  (page  1). 

1^-  calculation  sheet:  On  the  S Measure  Table  (page  2),  add  up  the  "S  subtotal" 
column  to  give  Sj.  Fill  in  the  corresponding  entry  on  the  Summary  Calculation 
page.' 

15.  calculation  sheet:  On  the  Address  Mode  Table,  add  up  the  '■-iiijmns  giving 
^anv  *^am’  copy  these  to  the  summary  page. 

16.  calculation  sheet;  Add  up  the  entries  on  the  summary  page  to  give  the  S,  M,  and  R 
measures. 

17.  calculation  sheet;  Perform  the  consistency  check  given  on  the  summary  page. 

18.  work  sheet:  Multiply  the  N column  of  the  work  sheet  by  the  M column  and  write 
the  result  in  the  NM  column.  Multiply  the  iM  column  by  the  R column  and  write  the 
result  in  the  NR  column.  Calculate  the  sum  of  the  NM  colum-;  this  should  be  the 
same  as  the  final  M measure.  Similarly,  the  sum  of  the  NR  column  should  be  the 
same  as  the  final  R measure. 

f 
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ADDRESS  MODE  TABLE 


ADDRESSING  MODE 

NO.  TIMES 
EXECUTED 

M 

m/mODE  Subtotal 

R 

SUBTOTAL 

c.  : 

amo 

i 

1 

i 

1 

H 1 

I 

'Di 

2 (byte 
mode) 

Hi 

2 (word 
mode) 

m 

3 

2 

9 

^ (byte 
mode) 

6 

4 (word 
mode) 

9 

Hi 

5 

2 

9 

6 

2 

15 

7 

4 

17 

C : 
am 

M : 
am 

851 

R : 

am 

S M«Mur«  Table 


Item 

Count 

^/COUNT 

S Subtotal 

1'  ' — 

Instructions 

■ 

! 2 1 

— ! 

Immediate  and  Absolute  Operands 
, («  and  ■Wu) 

1 

! 2 1 

1 i 

1 

Indexed  Operands 
(modes  6 and  7) 

1 

1 

i ' 1 

' i 

Si 
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Floating  Point  Instruction  Table  - Page  1 


Mne- 

monic 

Instruction 

1 

I . , L 

ABSD 

Mak*  abiolut#  (doubk)  | 

[ ; 

Ibasic 

1 

j mode  > 0 

ABSF 

Mako  Abiolula  (floating)  1 

1 basic  ^ 

1 mode  > 0 

No.  Times  I M/INST' M Sub-  R/INSTI  R Sub-' 
Executed  i I total  total  i 


ADOD  . Add  (doubi*) 


AODF  I Add  (floalinf ) 


I basic  

mode  > 0 

basic | 

mode  > 6 


j CFCC  I Copy  floaiinf  condil  ion  codos 


1 

1 CMPO 

Comparo  (doubW) 

' basic 

sz: 

••  2J__| 

1 

i 

1 mode  > 0^^ 

8 

! CMPF 

} 

Comparo  (floatinf) 

basic  ^ 

'v  2 

1 

' 17  i 

; i mode  > ^ \ ^ 1 

1 

1 DIVD 

, Divida  (doublo) 

I basic  ' 

33 

1 

■ mode  > 0 

8 

8 1 

DIVF  Divid*  (floating)  < bastc  * /X 

2 

21 

\ 

i • mode  > 0^>^r 

i 

t 4 

j LDCOF  Load,  eonvoftiof  doublo  to  fioatinf  b3SiC 
» I mode  > 0 
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Floating  Point  Instruction  Table  - Page  2 


Instruction 


No.  Times  M/1NST|M  Sub-  R/INST  i R Sub- 
Executed  total  I total 


LDD  • 

Load  (doubt*) 

^ 

basic 

25  ' ‘ 

mode  > 0 

8 

- 8 ! 

LDEXP 

Load  Exporionf 

basic 

5^:  r 

12  '' 

LDFPS  Load  ftoatmc  aUtua 

-basic  _ ^ 

mode  > 

MODO  Multiply  and  inlafriza  (doubla) 


MODF  Multiply  and  intafriza  (floatinf) 

i 


MULO  Multiply  (doiAla) 


basic 

mode  > O' 


^asic 

I mode  > 0^ 


! basic 
1 mode  > O ' 


MULF  i 

1 Mutliply  (floatiftf)  j 

i basic  1.^ 

! 

1 

! 1 

! mode  > 0]^^ 

— 

NEGO 

— 
Nagata  (doubla)  | 

i 

NEGF 

Nofata  (floatinf)  \ 

1 

j 

SETD 

Sat  doubla  moda  | 

SETF 

1 

Sat  Floatinf  moda 

mode  > 0 


SETI  ^ Sat  ahort  intafar  moda 


SETL  Sal  Ions  intatar  moda 


STCOI  Stora,  convartinf  doubla  to  intafar 

basic 

' 2 

■ 19 

1 

mode  > 0 

2 

■ - 2 , 

STCOL'  Stora.  convartinf  doubla  to  lonf 

basic 

! '> ! ' 10 : 

{ 

mode  > 0 

CxC  A ■ 

' - 2 

1 STCFO  Stora,  convartinf  floatinf  to  doubla  baSIC 

, I 

1 i mode  > 

8 

i - 8 ^ 
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Floating  Point  Instruction  Table  - Page  3 


! Mne- 
monic 

Instruction 

1 

i 

i 

No.  Times  U/INST  Im  Sub-  R/INST 
Executed  total 

R Sub- 
total 

STCFI 

Stor«,  €onv«rtin|  floatinf  to 

basic 

15 

mode  > 0 

2 

- 2 

1 

! STCFL 

L . 

Storo,  convortmf  floattnf  to  lonf 

basic 
mcde  > o' 

2 

4 

17 
■ - 4 

' STD 

Store  (double) 

basic 

'XT 

2 

25 

mode  > 0 

'S' 

- 8 

; STEXP 

Store  EKponent 

basic 

2 

12 

1 

mode  > 0 

1 

- 1 

— 

jSTF 

Store  (floattnf) 

basic 

2 

17 

1 

mode  > 0 

X 

4 

- 4 

I STFPS 

Store  tloattn{  status 

basic 

X 

2 

13 

i 

mode  > 

2 

- 2 

i STST 

Store  status  and  exception  address 

basic 

2 

13 

1 

!~subo 

mode  > 0 

x^. 

4 

, 0 

Subtract  (double) 

basic 



2 

33 

mode  > 0 

8 

8 

SUBF 

Subtract  (tloetin|) 

i 

' basic 

X 

2 

! 21 

• 

mode  > 

4 

. ^ 

TSTD  T#tt  (doub)«) 


, basic 


1 mode 


TSTF  Ttil  (ftoaiinf) 


■basic 
' mode  > O' 


COMB 

Compi«m«nf  Byi* 

' basic 

mode  > 0 

DEC 

i basic 

1 mode  > 0 

DECB 

0»cr»m«nt  ByU 

basic  ' 

1 mode  > 0 
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Single  Operand  Instruction  Table  - Page  1 


ADCB  ' Add  carry  byta 


i basic 
i mode  > ' 


' ASL 

' Artthmchc  SHift  Ltft 

basic 

' 

mode  > 0 

ASLB  Arilhmtlic  Shifl  Ltfl  Byla  basic 


I mode  > 0 


ASR  Arithmal'C  Shift  Rj|hl  basic 

! I mode  > 0 


ASRB  Arithmatic  Shift  Rifht  Byta  _baSJC 

I ; mode  > 0 


CLR 

' Clair 

1 

i basic 

i mode  > 0 

CLRB  Claar  Byta 


I basic 
I mode  > 0 


COM  Complamant 


basic 

1 mode  > 0 


INC  Ineramant 


! basic 
. mode  > 0 


INCB  , Ineramant  Byta 


basic 
mode  > 0 


JMP  ' Jump 


JSR  i Jump  to  Subroutina 


NEG  , Naiata 


basic 

mode  > 0 


R Sub- 
total 


Page  subtotals  X Csi  Cs.ami 


i 
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Single  Operand  Instruction  Table  - Page  2 


mode  > 0 


SBC  Subtract  carry  basiC 

mode  > 0 


SBCB 

SubUkcl  carry  bytt 

basic 

2 

! 

mode  > 0 

2 

0 

1 SWAB 

Swap  Bytvt 

basic 

2 

14 

1 

mode  > 0 

4 

O'  — 

' 1 
|SXT 

S'fn  txtand 

basic 

>< 

2 

! 11  ■ 

1 

1 

1 

mode  > 0 

2 

; -2 

TST 

T«it  1 

basic 

2 

1 11 

1 

1 

! 

1 mode  > 0 

2 

1 2 

TSTB 

Taat  Byt* 

basic 

2 ' 

1 10  ' 

i 

mode  > 0 

1 

1 

1 ■ 1 
’ i 

1 ^ L ' ! 

1 

Page  subtotals 

1 

Cs2 

0-22 


A 
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Miscellaneous  Instruction  Table 


1 

1 Mne- 
monic 

Instruction 

No.  Times 
Executed 

M/INST  M Sub- 
total 

R/INST 

R Sub- 
total 

1 BPT  1 

Breakpoint  Trap 

6 

33 

1 1 

Branches 

2 

14 

EMT 

Emulator  Trap 

6 

33 

lOT 

I/O  Trap 

6 

33 

RTI 

Return  from  Interrupt 

6 

27 

RTS 

Return  from  Subroutine 

4 

22 

SOB 

Subtract  One  and  Branch 

2 

17 

TRAP 

• 

6 

33 

Page  subtotals 


1 '^misc 


^misc 


bit  set  (OR)  basic  count 

source  mode  ^ 0 
Jest,  mode  ■ 0 


u 

0 

i 

2 * 

12 

BIT  bit  test  (AND)  Ibasic  count 


1 

i 1 ^ 

2 

1 

1 

2 

1 i 

; 1 

1 

1 13  _ 

BITB  bit  test  (A'iD)  basic  count 

byte  source  mode  0 

dest.  mode  ■ 0 


CMP  compare  Ibasic  count 

Idest.  mode  • 0 


COMPB  compare  byte  basic  count 


MNE-  I 

MONIC  INSTRUCTION 


NO  TIMES 
SECUTED 


ADDRESS!  M/  M Sub- ( R/  « Sub 
BASIC  MODE  ' INST  Total  Inst  I Total 
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Instructions  for  Completing  the  Interdata  8/32 
Calculation  Sheet 


To  assist  you  in  the  computation  of  the  S,  M,  and  R measures  for  the  Interdata  " | 

8/32,  we  have  included  here  a set  of  calculation  sheets  (8  pages)  and  a liberal  supply  i 

of  work  sheets.  The  work  sheets  are  designed  to  be  directly  attached  to  the  listing  of  ! 

your  test  program  and  have  entries  for  each  line  of  your  program.  Use  as  many  work 
sheets  as  you  have  pages  of  code  m your  listing.  The  calculation  sheets  will  be  used 
to  enumerate  S,  M,  and  R measures  via  addressing  modes  and  instruction  type. 

The  principle  use  of  the  work  sheets  will  be  to  get  accurate  counts  of 
instruction  executions.  The  calculation  sheets  will  be  used  to  determine  the  final  S,  M, 

and  R measures.  Later  we  will  indicate  how  to  obtain  these  measures  from  your  work  j 

sheets  directly,  but  this  will  be  used  primarily  as  a check  on  your  calculations. 

1.  You  will  need  to  refer  to  most  of  the  calculation  sheets  at  the  same  time,  so  you 
should  work  at  a dcsK  with  enough  space  to  spread  out  all  of  the  sheets  at  once. 

You  can  lay  aside  the  summary  calculation  sheet  (page  1)  since  you  will  not  need 
it  until  the  end.  You  will  also  need  the  CFA  Test  Data  Specifications  of  June  16, 

1976. 

2.  Attach  the  work  sheets  to  your  program.  The  work  sheets  have  a line  for  each 
line  of  yOur  program,  and  a column  for  the  instruction  type  (Mode),  the  number  of 
times  the  program  is  executed  (N),  the  storage  measure  (S),  the  memory  transfer 
measure  (M),  the  register  transfer  measure  (R),  and  the  products  N*M  and  N»R. 

3.  Go  through  your  program  to  determine  how  often  each  Instruction  will  be 

executed,  using  the  test  data  items  marked  with  a plus  (■•■)  in  the  specifications. 

Fill  in  the  N column  of  the  work  sheet  with  these  numbers.  Where  there  is  more 
than  one  set  of  test  data  for  a particular  program,  determine  the  number  of  times 
each  instruction  is  executed  for  each  set  of  data,  add  the  results  together,  and 
write  the  sum  in  the  N column.  If  your  program  is  simple  enough,  you  may  be 
able  to  do  an  analysis  to  determine  these  numbers  m terms  of  the  size  of  the  test 
data  and  some  simple  cnaraderistics  (such  as  the  number  of  zeros  in  the  matrix 
for  the  boolean  matrix  transpose  program).  Failing  that,  you  may  need  to  hand- 
simulate  the  operation  of  your  program  on  the  test  data. 

4.  Perform  steps  5 to  12  for  each  instruction  in  the  program.  Each  step  is  labeled 
with  the  name  of  the  sheet  on  which  the  number  calculated  in  the  step  is  to  be 
written. 

5.  work  sheet:  Determine  the  type  of  the  instruction  (RR,  SF,  RXl,  RX2,  RX3,  Rll,  or 
RI2)  and  mark  the  typo  in  the  Mode  column  of  the  work  sheet. 

6.  calculation  sheet:  Find  the  row  of  the  Instruction  Format  Table  (page  2)  j 

corresponding  to  the  type  determined  in  step  5.  Add  one  to  the  "no.  static  I 
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occurrences"  column  of  that  row,  and  add  the  number  in  the  N column  of  the  work 
sheet  to  the  "no.  times  executed"  column  of  the  row. , 

7.  work  sheet:  Copy  the  number  from  the  "S/INST"  column  of  the  row  in  step  6 to 
the  S column  of  the  work  sheet. 

8.  calculation  sheet;  For  each  effective  address  calculation  in  the  instruction,  find  the 
appropriate  row  in  the  Address  Calculation  table  (page  2).  RR  and  SF  instructions 
are  included  even  though  they  in  reality  have  no  address  calculation.  The 
addressing  modes  which  involve  index  or  base  registers  have  separate  rows 
depending  on  whether  the  register  involved  is  register  zero  or  not.  Add  one  to 
the  "no.  of  times  executed"  column  of  the  row. 

9.  calculation  sheet:  Find  the  row  of  the  Instruction  Table  (pages  3 to  9) 
corresponding  to  the  instruction.  Add  the  number  in  the  N column  of  the  work 
sheet  to  the  "no.  of  times  executed"  column  of  this  row.  If  the  "R/INST"  and 
"M/INST”  columns  of  this  row  are  blank,  mark  the  row  (preferably  in  a colour  that 
stands  out),  write  zeros  in  these  two  columns,  and  on  a separate  sheet  make  note 
of  the  fact  that  the  instruction  was  missing. 

10.  work  sheet:  Add  the  number  in  the  "R/INST"  column  of  the  row  found  in  step  9 
and  the  number  in  the  "R/INST"  column  of  the  row  found  in  step  8.  Write  this 
sum  in  the  R column  of  the  work  sheet. 

11.  work  sheet;  Add  the  number  in  the  "M/INST"  column  of  the  row  found  in  step  9 
and  the  number  in  the  "M/INST"  column  of  the  row  found  in  step  6.  Write  this 
sum  in  the  M column  of  the  work  sheet. 

12a.  calculation  sheet:  If  the  entry  in  the  Instruction  Table  (found  in  step  9), has  an 
asterisk  (*),  the  M and  R measures  depend  on  some  other  count  associated  with 
the  instruction.  For  example,  a STM  (store  multiple)  instruction  costs  4 bytes  in 
the  M measure  for  every  register  stored.  For  each  such  starred  instruction,  look 
up  its  entry  in  the  "Miscellaneous  M and  R table"  (page  9),  calculate  the 
appropriate  "what  to  count"  number,  multiply  by  the  entry  in  the  N column  of  the 
work  sheet,  and  add  this  number  to  the  "count"  column  of  the  table. 

. 12b.  work  sheet:  Multiply  the  "what  to  count  number"  determined  in  step  12a  by  the 
number  in  the  "M/COUNT"  entry  used  in  that  step,  and  add  this  product  to  the  M 
column  of  the  work  sheet. 

When  you  have  completed  these  calculations  for  the  entire  program,  compute 

the  totals  in  the  tables  as  outlined  in  steps  13  to  19. 

13.  calculation  sheet:  For  each  entry  of  the  Instruction  Table  (page  3 through  9), 
multiply  the  number  in  the  "no.  of  times  executed"  column  by  the  number  in  the 
"M/INST"  column  and  write  the  product  in  the  "M  subtotal"  column.  Multiply  the 
number  in  the  "no.  of  times  executed"  column  by  the  number  in  the  "R/INST" 
column  and  write  the  product  in  the  "R  subtotal"  column. 

14.  calculation  sheet:  For  each  page  of  the  Instruction  table,  calculate  the  subtotals  of 
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the  "number  of  times  executed",  "M  subtotal",  and  "R  subtotal"  columns,  fill  in  the 
subtotals  at  the  bottom  of  the  page  (Cpj,  Mpj,  Rpj  for  instruction  page  1,  and  so 
on),  and  write  these  subtotals  in  the  space  provided  on  the  Summary  Calculation 
page  (page  1). 

15.  calculation  sheet:  On  the  Instruction  Format  Table  (page  2),  add  up  the  "S 
subtotals",  "number  of  times  executed",  and  "M  subtotal"  columns  to  give  Sj,  Cj, 
and  Mj.  Fill  in  the  corresponding  entries  on  the  Summary  Calculation  page. 

16.  calculation  sheet:  On  the  Effective  address  Calculation  table,  add  up  the  columns 
giving  Cgg^.  and  Rgg^,  arid  copy  these  to  the  summary  page. 

17.  calculation  sheet:  Add  up  the  entries  on  the  summary  page  to  give  the  S,  M,  and  R 
measures. 

18.  calculation  sheet:  Perform  the  consistency  checks  given  on  the  summary  page. 
You  should  get  Cj  equal  to  Cpj  + ...  + Cpy,  and  Cgg^  equal  to 

*^RX1,RX2,RI1  ^RX3,RI2- 

19.  work  sheet:  Multiply  the  N column  of  the  work  sheet  by  the  M column  and  write 
the  result  in  the  NM  column.  Multiply  the  N column  by  the  R column  and  write  the 
result  in  the  NR  column.  Calculate  the  sum  of  the  NM  column;  this  should  be  the 
same  as  the  final  M measure.  Similarly,  the  sum  of  the  NR  column  should  be  the 
same  as  the  final  R measure. 
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Instruction  Subtotal,  S| 

Constants  and  Local  Variables 
Temporary  Workspace: 

S 


M Measure  R Measure 


Instruction  Format  Subtotal, 

M,: 

Instruction  Format  Subtotal,  Rp 

Instruction  Page  Subtotal, 

•^Pl=  

Instruction  Page  Subtotal,  Rp|: 

Mp2!  

Rp2: 

Mp3:  

^P3= 

Mp4:  

Rp4: 

Mp5'  

’’ps^ 

Mpg:  

Rpg: 

Mp7=  

Rp?: 

Miscellaneous  Subtotal, 

M«: 

Miscellaneous  Subtotal,  R„: 

M 1 1 

Address  Calculation  Subtotal,  R^ac' 

Consistency  Checks  on  Calculations 

You  can  use  the  following  identities  to  provide  some  measure  of  assurance  you  heve  not 
mede  a clerical  error  in  tabulating  the  S,  M and  R measures: 

(1)  Cpj  ♦ Cp2  + Cp3  ♦ Cp4  ♦ Cp5  ♦ Cpg  ♦ Cpy  • C] 

(2) Crp  ♦Crxi  ♦Cpx3  "Ceae 
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Instruction  Tsbie  - Page  1 


Mne- 

Instruction 

No.  Times 

monic  , 
■ ■ - H 

Executed 

R/INST  R i 
subtotal 


ABL 

Add  to  Bottom  ol  Ltit 

AE 

Add  tiootiftf  point 

AER 

Add  flootinf  point  Rofiotop 

AH 

Add  Holfword 

AHI 

Add  Holfword  Immodiatt 

AHM 

Add  Halfword  to  Mamory 

AI 

Add  Immadiata 

AIS 

Add  Immadiota  Short 

AL 

Autoload 

AM 

Add  to  Mamory 

AR 

Add  rafiotar 

ATL 

Add  to  Top  of  lift 

BAL 

Branch  and  link 

BALR 

Branch  and  link  Rafiatar 

BDCS 

Branch  to  Control  Stora 

BFBS 

Branch  on  Falaa  Condition  Backward  Short 

BFC 

Branch  on  Falaa  Condition 

BFCR 

Branch  on  Falaa  Condition  Rafiatar 

BFFS 

Branch  on  Falaa  Condition  Forward  Short 

BTBS 

Branch  on  Trua  Condition  Backward  Short 

BTC 

Branch  on  Trua  Condition 

BTCR 

Branch  on  Trua  Condition  Rafiatar 

BTFS 

Branch  on  Trua  Condition  Forward  Short 

BXH 

Branch  on  Indaa  Hifh 

Page  subtotals 
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Instruction  Table  - Page  2 


I Mne-  Instruction 

I monic 


BXLE 

Branch  on  IndoH  Low  or  Equal 

C 

Comparo 

CBT 

Complainant  Bit 

CE 

Compara  Floatmf  Point 

CER  I Compar*  Floatinf  Point  R«fi«t«r 


CH 

Compara  Halfword 

CHI 

Compara  Halfword  Immadiata 

CHVR 

Convart  to  Halfword  Valua  Rafiatar 

Cl 

Compara  Immadiata 

CL 

Compara  Lofical 

CLB 

Compara  Lofical  Byta 

CLH 

Compara  Lofical  Halfword 

CLHI 

Compara  Lofical  Halfword  Immadiata 

CLI 

Compara  Lofical  /mmadiafa 

CLR 

Comara  Lofical  Raf;ttar 

CR 

Comara  Rafiatar 

CRC12 

Cyclic  Radundancy  CKack  modulo  12  ' 

1 

CRC16 

1 

Cyclic  Radundancy  Chack  modulo  16 

Divtd* 


DE  Divida  Floattnf  Point 


DER  I Oivida  Floatmf  Point  PoRittar 


DH 

Divida  Halfword 

DHR 

Divida  Halfword  Rafiatar 

DR 

0«vida  Rafiatar 

ECS 

Entar  Control  Stora 

Page  subtotals 
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CacK«ft|«  Profram  Stalua  Rafiatar 


Exchaf>f«  Bytt  R«|iat«r 


Exchanf*  Halfword  Rtfialar 


Float  Rafiftar 


Fix  Rat'atar 


Load 


Load  Addraaa 


Load  Byta 


Load  Byta  Rafiatar 


Load  Ftoatiftf  Point 


Load  Fioatinf  Point  Rafiatar 


Load  Halfword 


Load  Halfword  Immodtaf# 


Load  Halfword  Lofical 


Load  Irnmadiata 


Load  Immadiata  Short 


Load  Muitipla 


LPSWR  Load  Profram  Statua  Word  Rafiatar 


LR  Lead  Rafiatar 


M Multiply 


ME  Multiply  Fioatinf  Point 


MER  Multiply  Fioatinf  Point  Rafiatar 


Page  subtotals 


No.  Times 
Executed 


LME 

Load  Fioatinf  Point  Mullipla  1 

LPSW 

Lead  Profra*n  Statua  Word  > 

1 
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Instruction  Table  - Page  4 


Mne- 

Instruction 

I No.  Times 

monic 

Executed 

MH  I Multiply  Halfword 


MHR  Multiply  Halfword  Rafiatar 


MR  Multiply  Rafiatar 


N AND 


NH  AND  Halfword 


NHI  AND  Halfword  Immadiata 


NI  1 AND  Immadiata 


RB 

— u 

Raad  Bloch 

RBL 

Ramova  from  Bottom  of  Liat 

RBR 

Raad  Block  Rafiatar 

RBT 

Raaat  Bit 

1 RD 

1 

Raad  Data 

ROCS 

Raad  Control  Stera 

RDR 

Raad  Data  Rafiatar 

RH 

Raad  Halfword 

RHR 

Raad  Halfword  Rafiatar 

6 July  1976 


Mne- 

monic 

Instruction 

RRL 

RTL 

Remove  from  Top  of  Lief 

S 

Subtract 

SBT 

Sot  Bit 

SCP 

Simulate  Channel  Profram 

SE 

Subtract  Floatinf  Pomt 

SER 

Subtract  Floatinf  Point  Refitter 

SH 

Subtract  Halfword 

SHI 

Subtract  Halfword  Immediate 

SI 

Subtract  Immediate 

SINT 

Simulate  Interrupt 

Instruction  Tsbie  - Page  5 


No.  Times  M/INST  M 
Executed  subtc 


Subtract  Immadiata  Short 


Shift  loft  Arithmotic 


Shift  L#ft  Halfword  Arithmetic 


Sh»ft  Lett  Halfword  lofical 


SLLS  ! 

1 

Shift  Left  Lofical  Short 

1 

SR 

Subtract  Refiater 

SRA 

1 

j Shift  Rifht  Arithmetic 

1 

SKHA  I Shift  Ruht  Halfword  Arithmetic 


SRHL 

h 

Shift  Rifhl  Halfword  Lofical 

SRHLS 

Shift  Rifht  Halfword  lofical  Short 

SRL 

Shift  Rifht  Lofical 
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Mr't' 

monic 

Instruction 

ss 

Sanaa  Statua 

SSR  . 

Sanaa  Statua  Pafiatar 

ST 

Stora 

STB 

Stora  Byta 

STBR 

Stora  Byta  Rofiatar 

STE  I Stor#  Floatinf  Point 

I 


STH  Storo  Halfword 


STME  Storo  Muttipl#  Floatins  Point 


SVC  Suporviaor  Call 


TBT  Tatt  Bit 


THI  Tast  Halfword  Immadiata 


TI  Tatt  Immadiata 


tlate! 


Tranalata  (no  branch) 


TS 

1 Taat  and  Sat 

WB 

1 

1 Wnta  Block 

WBR 

Writa  Block  Ra^iatar 

WD  j Wrfta  Data 


WPCS  ConI'Ol 
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Instruction  Table  - Page  7 


Mne- 

monic 

Instruction 

No.  Times 
Executed 

M/INST 

1 M 

1 subtotal 

R/INST 

R 

subtotal 

XHI 

Cxclutiv*  Oft  Halfword  Immodtalo 

m 

10 

XI 

EkcIuoivo  or  Immodiato 

n 

12 

1 

XP 

EkcIusivo  Oft  ftofttlor 

• 

0 

12 

1 

1^ 

Page  subtotals  { 

L 

Cp7 
_ , 

H 

Rp7 

Miscellaneous  M and  R Table 
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APPENDIX  E - S,  M,  and  R MEASURES  FOR  EACH  TEST  PROGRAM 


INDIVIDUAL  R MEASURES 


Test  Program 

IBM  S/370 

Ccsnputer  Architecture 

PDP-11  Interdata  8/32 

A.  Priority  I/O  Kernel 

947[3] 

2146C12] 

3052L14] 

108C4] 

106[12] 

106C14] 

166[12] 

166[17] 

214[14] 

B.  FIFO  I/O  Kernel 

2222[2] 

4583C13] 

2226[17] 

1096[2] 

810C3] 

1419 [13] 

698[2] 

937[4] 

482[13] 

C.  I/I  Device  Handler 

1789[1] 

1729C17] 

1480C1] 

1416 [17] 

1902[1] 

1391[17] 

D.  Large  FFT 

62904[11] 

62904C9]* 

70512[11] 

70512[9]* 

60446[11] 

50045[9]* 

E.  Character  Search 

5603C1] 

5549[4] 

10239[11] 

4348[1] 

4326[11] 

3091[17] 

5885[1] 

3139[3] 

5767[11] 

F.  Bit  Test,  Set,  Reset 

1674[9] 

1542[12] 

1212C17] 

832[3] 

917[9] 

801[12] 

891[m-] 

887[9] 

1167[12] 

1281[11]A 

G.  Runge-Kutta  Int. 

845966C2] 

1203952C17] 

724372[2] 

665529[3] 

1012727[17] 

babCl3bL2] 

6960i»9[4] 

777846[11]A 

874923[17] 

H.  Linked  List  Insertion 

950[4] 

1741C13] 

1137C14] 

1025[13] 

1087[14] 

1210 [17] 

834[3] 

1049[13] 

965[14] 

I.  Quicksort 

7618C5] 

7540[6] 

74278[5] 

15205[6] 

13315[5] 

9609[6] 

J.  ASCII  to  ELoat-Pt. 

1330C4] 

2578C5] 

2226[7] 

1726[5] 

1512[7] 

1716[17] 

2100[3] 

2270[5] 

1897[17] 

K.  Boolean  Matrix 

5576[3] 

5661C6] 

5277[8] 

3180[4] 

390 5 [6] 
4445[8] 

2216[6] 

3154[8] 

3945[17] 

L.  Virtual  Memory  Exchange 

1931[3] 

1934[7] 

2529[8] 

2616[4] 

2911[7] 

4226[8] 

2539[7] 

4573[8] 

2643[17] 

n'ID.rVIDu'AL  V.  i-0.?lfRE 


Test  Program 


Coinputer  Architecture 


IBM  S/370 

PDP-11 

Interdata  8/32 

A.  Priority  I/O  Kernel 

212  [3] 

354  [12] 

522  [14] 

28  [4] 

24  [12] 

24  [14] 

28  [12] 

32  [14] 

28  [17] 

B.  FIFO  T/0  Kernel 

424  [2] 

920  [13] 

434  [17] 

208  [2] 

188  [3] 

296  [13] 

192  [2] 

226  [4] 

114  [13] 

C.  I/O  Device  Handler 

328  [1] 

304  [17] 

309  [1] 

290  [17] 

426  [1] 

279  [17] 

D.  Large  FFT 

10810  [11] 

10810  [9]* 

14746  [11] 
14746  [9]* 

10886  [11] 

8560  [9]* 

8560  [17 ]A 

E.  Cliaracter  Search 

854  [1] 

940  [4] 

1724  [11] 

730  [1] 

770  [11] 

520  [17] 

958  [1] 

1044  [3] 

1021  [11] 

F.  Bit  Test, Set, Reset 

378  [9] 

358  [121 

238  [17] 

162  [3] 

178  [9] 

152  [12] 

222  [4] 

176  [9] 

296  [11]A 

276  [12] 

G.  R\nge-l\utta  Int. 

141074  [2] 

223056  [17] 

102662  [2] 
94960  [3] 
176960  [17] 

100062  [2] 
100042  [4] 
117984  [11]A 
138414  [17] 

r,.  Linl.ed  List  Insertion 

228  [4] 

304  [13] 

264  [14] 

204  [13] 

218  [14] 

240  [17] 

224  [3] 

260  [13] 

238  [14] 

I.  Quicksort 

1024  [5] 

1008  [6] 

14960  [5] 

2756  [6] 

2968  [5] 

1732  [6] 

ASCII  tc  Float-Pt. 

241  [4] 

437  [5] 

433  [7] 

292  [5] 

275  [7] 

283  [17] 

363  [3] 

423  [5] 

334  [7] 

VL  Boolean  Matrix 

332  [3] 

909  [6] 

896  [8] 

582  [4] 

776  [6] 

932  [8] 

384  [6] 

566  [8] 

640  [17] 

L.  Virtual  Memory 
Exchange 

532  [3] 

532  [7] 

645  [8] 

541  [4] 

566  [7] 

945  [8] 

721  [7] 

1058  [8] 

780  [17] 

TNDr/TDUAL  S MEASURF? 
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IBM  S/370 

PDP-11 

Ir.^erlatJ  8/32 

A.  Priority  I/O  Kernel 

216  [3] 

48  [4] 

26  [12] 

286  [12] 

32  [12] 

28  [14] 

742  [14] 

32  [14] 

26  [17] 

B.  FIFO  I/O  Kernel 

372  [2] 

133  [2] 

144  [2] 

465  [13] 

124  [3] 

142  [4] 

308  [17] 

246  [13] 

96  [13] 

C.  I/O  Device  Handler 

192  [1] 

132  [1] 

176  [1] 

252  [17] 

216  [17] 

241  [17] 

D,  Large  FIT 

454  [11] 

766  [11] 

550  [11] 

454  [9j* 

766  [9]* 

402  [9] 

402  [17]A 

E.  Character  Search 

104  [1] 

88  [1] 

120  [1] 

92  [4] 

136  [11] 

144  [3] 

154  [11] 

90  [17] 

168  [11] 

F.  Bit  Test, Set , Reset 

144  [9] 

68  [3] 

82  [4] 

122  [12] 

78  [9] 

90  [9] 

116  [17] 

86  [12] 

98  [11]A 

98  [12] 

G.  Runge-Kutta  Int. 

202  [2] 

18U  [2] 

166  [12] 

238  [17] 

172  [3] 

248  [17] 

105  [4] 

232  [11]A 

190  [17] 

H.  Linked  List  Insertion 

144  [4] 

162  [13] 

1U8  [3] 

228  [13] 

182  [14] 

198  [13] 

176  [14] 

194  [17] 

164  [14] 

I.  Quicksort 

340  [6] 

940  [6] 

U26  [6] 

407  [5] 

1534  [5] 

524  [5] 

J.  ASCII  to  Float-Pt. 

256  [4] 

164  [5] 

205  [3] 

441  [5] 

208  [7] 

238  [5] 

241  [7] 

172  [17] 

204  [7] 

K.  Boolean  I^trix 

224  [3] 

174  [4] 

156  [17] 

267  [6] 

232  [6] 

130  [6] 

284  [8] 

284  [8] 

180  [8] 

L.  Virtual  Lfemory 

Exchange 

292  [3] 

254  [4] 

328  [17] 

382  [7] 

250  [7] 

310  [7] 

414  [8] 

378  [8] 

334  [8] 

